Digital Index of North American Archaeology, Linking Sites and Literature
Finds Smithsonian trinomials and other site identifiers from published literature to link to sites in DINAA
The Digital Index of North American Archaeology, Linking Sites and Literature (DINAA LSL) provides a uniquely nuanced way to begin library research in American archaeology. Published literature which contains archaeological site numbers is provided in the DINAA LSL with full bibliographic citations and stable URLs (as available) which can be associated with geographic regions, and archaeological concepts of culture, time, and investigation represented in the DINAA archaeological site dataset.
Map Representations of DINAA and Other Information Resources
- Sites Referenced by American Antiquity (2004-2013) Articles
- Map View (American Antiquity Only)
- Sites Referenced by the Index of Texas Archaeology (ITA)
- Map View (ITA Only)
- Sites Referenced by the Federal Register
- Map View Federal Register Map View
- Sites Referenced by Sources outside Open Context
- Map View (Multiple Information Sources)
- All DINAA Sites with Cross-References to Other Web Resources
- Map View (Multiple Information Sources, including tDAR)
Links with American Antiquity Articles
The DINAA LSL may be used to begin a query for literature about the archaeological record using spatial, temporal, or cultural concepts, and then branch into literature about archaeological sites as appropriate. The 2016-2017 segment of DINAA LSL development involved visual identification of archaeological site numbers within articles from the most recent non-embargoed decade of the journal American Antiquity in JSTOR (2004-2013). This work was led by Joshua Wells and conducted at Indiana University South Bend through the support of the Institute of Museum and Library Services. The effort included identification of site number elements in article text, tables, and figures (which may not be subject to query via optical character recognition). When possible, these site numbers were associated with their US state and county locations, most commonly with site numbers in the Smithsonian trinomial format (SSCCNN) where "SS" is a 1-2 digit number designating the state of origin, "CC" is a two letter abbreviation associated with the county name (or in some cases, a National Park), and "NN" is a unique number attributed to the site (usually its place in an ordinal arrangement of recording from first to last); other site number systems that include designators such as National Parks or National Forests have been accommodated to approximate their position on a political US county map. Bibliographic information and stable URLs to items of published literature about archaeological sites in DINAA have been linked with the full representation of those sites in DINAA.
Links with the Index of Texas Archaeology
In November through December 2017, the DINAA team extracted trinomial identifiers and associations with reports through automated requests that obtained public metadata from the Index of Texas Archaeology (ITA). The system hosting the ITA provided an Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) web service that returned Dublin Core metadata records. Eric Kansa with the DINAA team developed and ran a software process software process to request metadata records from the ITA's OAI-PMH service. The software attempted to identify and extract Smithsonian Trinomials in the Dublin Core title, abstract, and subject elements returned from the OAI-PMH service. The software only looked for trinomials returned in the OAI-PMH provided metadata, not the full text of the associated reports published by the ITA. The software then associated these site identifiers within appropriate counties for state of Texas. Taylor Wiley, a student member of the DINAA team, checked the software identified trinomials and identified and corrected a few errors, as indicated in notes associated with certain site records.
Links with the Federal Register
In August 2019, the DINAA team extracted trinomial identifiers in archaeology related documents retrieved from the Federal Register, a US government service announcing regulatory determinations. Code used for extacting trinomials from the Federal Register can be found in version control here. The process to obtain and extract trinomials worked as:
- Open Context code used the Federal Register API to find documents in response to archaeology related search terms, which included: archeology, archeological, archaeology, archaeological, NAGPRA, cultural, heritage. The API responded with links to download documents in response to these search terms. Open Context then downloaded and cached (stored) the plain text versions of these documents in local storage. Open Context similarly downloaded and cached JSON files of document metadata.
- Open Context then used a simple regular expression string to find possible trinomials via pattern matching. Open Context then exported a CSV data table with possible trinomials and their source Federal Register document identifiers for review by Josh Wells and his research assistant, Mackenzie Edmonds.
- Josh Wells and Mackenzie Edmonds reviewed the possible trinomial identifiers to verify that matching strings did, in fact, reference archaeological sites. About 92% of automatically identified strings passed this human verification step.
- Eric Kansa then used Open Context to reconcile trinomial identifiers with records already published by DINAA or mint new records if no exacting matches could be found. Open Context then published the associations between site records Federal Register documents.
Expansion of the DINAA LSL in late 2017 and beyond will use a combination of automated and human text mining in repositories and the open Web. DINAA LSL obviously does not represent the entire body of published literature in American archaeology, but it does provide a starting point and a new way of interacting with publications for professional researchers, educators and students, and the general public.
|Property or Relation||Value(s)|
Open Context editors work with data contributors to annotate datasets to shared vocabularies, ontologies, and other standards using 'Linked Open Data' (LOD) methods.
The annotations presented above approximate some of the meaning in this contributed data record to concepts defined in shared standards. These annotations are provided to help make datasets easier to understand and use with other datasets.
Browse, Search Project
In preparation, draft-stage