Digital Index of North American Archaeology, Linking Sites and Literature

Name: Digital Index of North American Archaeology, Linking Sites and Literature
Published: 2017-10-19
License: https://creativecommons.org/publicdomain/zero/1.0

Finds Smithsonian trinomials and other site identifiers from published literature to link to sites in DINAA

Project Abstract

The Digital Index of North American Archaeology, Linking Sites and Literature (DINAA LSL) provides a uniquely nuanced way to begin library research in American archaeology. Published literature which contains archaeological site numbers is provided in the DINAA LSL with full bibliographic citations and stable URLs (as available) which can be associated with geographic regions, and archaeological concepts of culture, time, and investigation represented in the DINAA archaeological site dataset.

Map Representations of DINAA and Other Information Resources

Sites Referenced by American Antiquity (2004-2013) Articles: Map View (American Antiquity Only)
Sites Referenced by the Index of Texas Archaeology (ITA): Map View (ITA Only)
Sites Referenced by the Federal Register: Map View Federal Register Map View
All DINAA Sites with Cross-References to Other Web Resources: Map View (Multiple Information Sources, including tDAR)

Links with American Antiquity Articles

The DINAA LSL may be used to begin a query for literature about the archaeological record using spatial, temporal, or cultural concepts, and then branch into literature about archaeological sites as appropriate. The 2016-2017 segment of DINAA LSL development involved visual identification of archaeological site numbers within articles from the most recent non-embargoed decade of the journal American Antiquity in JSTOR (2004-2013). This work was led by Joshua Wells and conducted at Indiana University South Bend through the support of the Institute of Museum and Library Services. The effort included identification of site number elements in article text, tables, and figures (which may not be subject to query via optical character recognition). When possible, these site numbers were associated with their US state and county locations, most commonly with site numbers in the Smithsonian trinomial format (SSCCNN) where "SS" is a 1-2 digit number designating the state of origin, "CC" is a two letter abbreviation associated with the county name (or in some cases, a National Park), and "NN" is a unique number attributed to the site (usually its place in an ordinal arrangement of recording from first to last); other site number systems that include designators such as National Parks or National Forests have been accommodated to approximate their position on a political US county map. Bibliographic information and stable URLs to items of published literature about archaeological sites in DINAA have been linked with the full representation of those sites in DINAA.

Links with the Index of Texas Archaeology

In November through December 2017, the DINAA team extracted trinomial identifiers and associations with reports through automated requests that obtained public metadata from the Index of Texas Archaeology (ITA). The system hosting the ITA provided an Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) web service that returned Dublin Core metadata records. Eric Kansa with the DINAA team developed and ran a software process software process to request metadata records from the ITA's OAI-PMH service. The software attempted to identify and extract Smithsonian Trinomials in the Dublin Core title, abstract, and subject elements returned from the OAI-PMH service. The software only looked for trinomials returned in the OAI-PMH provided metadata, not the full text of the associated reports published by the ITA. The software then associated these site identifiers within appropriate counties for state of Texas. Taylor Wiley, a student member of the DINAA team, checked the software identified trinomials and identified and corrected a few errors, as indicated in notes associated with certain site records.

Links with the Federal Register

In August 2019, the DINAA team extracted trinomial identifiers in archaeology related documents retrieved from the Federal Register, a US government service announcing regulatory determinations. Code used for extacting trinomials from the Federal Register can be found in version control here. The process to obtain and extract trinomials worked as:

Open Context code used the Federal Register API to find documents in response to archaeology related search terms, which included: archeology, archeological, archaeology, archaeological, NAGPRA, cultural, heritage. The API responded with links to download documents in response to these search terms. Open Context then downloaded and cached (stored) the plain text versions of these documents in local storage. Open Context similarly downloaded and cached JSON files of document metadata.
Open Context then used a simple regular expression string to find possible trinomials via pattern matching. Open Context then exported a CSV data table with possible trinomials and their source Federal Register document identifiers for review by Josh Wells and his research assistant, Mackenzie Edmonds.
Josh Wells and Mackenzie Edmonds reviewed the possible trinomial identifiers to verify that matching strings did, in fact, reference archaeological sites. About 92% of automatically identified strings passed this human verification step.
Eric Kansa then used Open Context to reconcile trinomial identifiers with records already published by DINAA or mint new records if no exacting matches could be found. Open Context then published the associations between site records Federal Register documents.

Future Work

Expansion of the DINAA LSL in late 2017 and beyond will use a combination of automated and human text mining in repositories and the open Web. DINAA LSL obviously does not represent the entire body of published literature in American archaeology, but it does provide a starting point and a new way of interacting with publications for professional researchers, educators and students, and the general public.

Selected References

Neller, Angela, Jasmine Heckman, Elizabeth Bollwerk, Kelsey Noack Myers, and Josh Wells. 2024. "Making Archaeological Collections More Findable and Accessible through Increased Coordination." Advances in Archaeological Practice, 1-9. doi:10.1017/aap.2023.31.

Metadata

Descriptive Attribute	Value(s)
Contributor Vocabulary: DCMI Metadata Terms (Dublin Core Terms)	Joshua Wells info Vocabulary: Digital Index of North American Archaeology (DINAA) Taylor Wiley info Vocabulary: Digital Index of North American Archaeology (DINAA) Patrick Finnegan info Vocabulary: Digital Index of North American Archaeology (DINAA) Valeria Chamorro info Vocabulary: Digital Index of North American Archaeology (DINAA) Mackenzie Edmonds info Vocabulary: Digital Index of North American Archaeology (DINAA) Eric C. Kansa info Vocabulary: Petra Great Temple Excavations
Creator Vocabulary: DCMI Metadata Terms (Dublin Core Terms)	Joshua Wells info Vocabulary: Digital Index of North American Archaeology (DINAA)
Subject Vocabulary: DCMI Metadata Terms (Dublin Core Terms)	Archaeology info Vocabulary: Library of Congress Subject Headings Open Context References: Archaeology hub Salvage archaeology info Vocabulary: Library of Congress Subject Headings Open Context References: Salvage archaeology hub

Suggested Citation

Joshua Wells, Taylor Wiley, Patrick Finnegan, Valeria Chamorro, Mackenzie Edmonds, Eric C. Kansa. (2017) "Digital Index of North American Archaeology, Linking Sites and Literature". In Digital Index of North American Archaeology (DINAA). Released: 2017-10-19. Open Context. <https://opencontext.org/projects/0cea2f4a-84cb-4083-8c66-5191628abe67> DOI: https://doi.org/10.6078/M7542KRZ

Editorial Status

○○○○○

Part of Project

Digital Index of North American Archaeology (DINAA)

Copyright License

Public Domain Mark 1.0

While this content is in the public domain, please appropriately cite the contributors and Open Context

JSON View settings
API (Machine-readable) representation of this item

Explore Project Data

Checking project...

No records