Data Publication Guidelines for Contributors
Open Context offers free access to editorially-verified datasets, many of which are linked to print publications. Open Context also offers an optional peer review process to further validate datasets. The system uses a simple and generalized data model that can accommodate most archaeological datasets and museum collections. We welcome submissions and publication proposals. If you are interested in publishing your research in Open Context, please contact the Editor and follow these guidelines.
What Does Data Publishing Involve?
Publishing data with Open Context is similar to publishing with a conventional journal. Following a step-by-step process, contributors work with our editorial staff to review, clean, and document datasets so they become professionally recognized contributions.
1. Publication Planning
Cleaner and more consistent datasets are easier for you to analyze and for Open Context to publish. Develop strategies to reduce errors in recording (see below).
Even so, no dataset is perfect. Open Context's editorial staff have experience and tools to help make cleanup and editing as efficient as possible. We help you:
- Publish existing excavation, survey or collections data (and associated media, such as pictures, GIS files, etc.)
- Develop a grant "Data Management Plan" (see below).
2. Proposing a Publication
- Subject area, research themes documented by your dataset
- Size and complexity of your dataset (number of databases, data tables, media files, collaborators)
- Potential profesional and public audiences, inside and outside of archaeology
- Data sensitivity concerns (see below)
- Agreement of project stakeholders to publish data openly, under permissive copyright terms
- Covering publication fees
3. Submitting Data & Media Files
When you are ready to publish your data, you have various options for sending your digital files to Open Context's editors.
- Email: Suitable for smaller numbers of files, of limited size
- Drop Box: Suitable for sending entire directories of files
- FTP: Suitable for sending entire directories of files
Note: We may need your help to convert files to open file formats, especially if such converstion requires special software. Open formats are easier to archive and use across computing platforms. See below for more.
4. Quality Review & Edits
Open Context editors use Google Refine to check for consistency and fix minor problems in structured (tabular) data. We work with data contributors to fix more major problems. Reviews and edits focus on:
- Consistency: Terms used in classification need to be consistent (capitalization, plurals spacing).
- Decoding: We work with contributors to decode numeric codes and abbreviations to improve intelligibility.
- Validation: Numeric columns and fields should includ only numbers (with believable values).
- Identifiers: Identifiers need to be unique and unabigious in a dataset so different data tables, media files, etc. can be associated to them.
5. Pre-Publication Staging & Review
Open Context editors integrate project data tables, media and other files using the ArchaeoML data structure. This structure integrates different data elements, enabling search, browse and analysis within and between projects. During this step, researchers can preview how their data will look in Open Context before public release. Data staging involves:
- Additional debugging and error fixes
- Documentation and description of terminology and recording systems
- Assignment of authorship and citation credit to specific data elements
- Editorial board review for intelligibility, potential for reuse
6. Annotation with Standards
Open Context editors annotate data to standards that enable interoperability and new research opportunities, including integration with other collections on the Web. We use "Linked Open Data" methods to relate recording systems used within a dataset to community-embraced standards, including:
7. Publication & Archiving
Once contributors, the editorial staff, and the Editorial Board agree a dataset is ready for publication, we post data to the public on the open Web. This step involves:
- Archiving with the California Digital Library (and, optionally, other repositories)
- Assignment of persistent identifiers needed for long-term citation
- Posting to GitHub as a secondary delivery channel and continued version tracking
- Indexing to enable powerful queries and searches
- Public search-engine (Google, Bing, etc.) indexing for discovery
All content published with Open Context must first see approval by the editorial staff and relevant members of the Editorial Board (depending on disciplinary specialization). We do not reject data for lack of 'significance' like a conventional journal, since quality data may have unanticipated future applications and may improve the statistical power of future meta-analyses. Rather, our review criteria include:
- Methological soundness and data quality
- Quality of documentation
- Suitability for wider reuse
Contributors may request anonymous, external peer-review of their data publications. We ask reviewers to comment on the merits of a dataset according to the criteria listed above. Reviewer comments will be posted publicly associated with a dataset. The dataset will also be highlighted as 'peer-reviewed.'
Open Context indicates the editorial status of projects using a scale of one to five (indicated by filled circles). These circles are displayed on every record belonging to a project. The editorial status applies to a project as a whole (not to individual records within a project).
Peer-reviewedIn addition to seeing reviews and edits from the managing editor and the Editorial Board, at least two external experts in the subject area reviewed the project.
Editorial board reviewedOpen Context's managing editors and members of the Editorial Board reviewed the project.
Managing editor reviewedOpen Context's managing editors closely reviewed and edited a project.
Minimal editorial acceptanceOpen Context's managing editors accepted a project's dataset even though it may lack important documentation and supplemental information. In some cases, the primary value of the project may be to serve instructional or public outreach goals, rather than research uses.
Demonstration, Minimal editorial acceptanceOpen Context's managing editors accepted a project's dataset even though it lacked important documentation and supplemental information. Primarily, these project datasets serve as demonstrations or proof-of-concept applications of Open Context, rather than for research purposes.
Data publication with Open Context requires extensive processing and review. Reference to the points below can help make this effort more efficient.
Good Database Design: Good database design from a project's start makes eventual data publication easier. Normalization (removal of redundant information) helps to maintain data quality. Maintaining consistency is also important. For example, numeric data fields should contain only numeric data. If additional notation or explanation is required for some numeric information, these should go in other fields. Data validation (error checking) practices throughout data collection will speed data publication and help make the published data more valuable and easier to use by others.
Clean-up and Edits: Publishing data in Open Context is a form of publication, but one that differs from journal articles or books. Because datasets are often fairly "raw," one should not expect perfect spelling, grammar, or compositional excellence in daily logs, database comment fields, etc. Spelling problems in these fields will probably have little impact on the overall usability of contributed data. However, some errors have greater impact. For instance, nominal values (terms used over and over again), such as the terms used to describe artifacts in a small finds database ("lamp," "coin," "spindle-whorl"), should be consistent (in terms of plurals, terminology, and spelling) to aid search and understanding. Identifiers for objects or contexts (such as "catalog #," "locus #"), especially those that have associated descriptive information should also be free of errors.
Decoding: To speed up data entry, many people use coding systems as a convenient way to record data. However, these coding systems may be unintelligible without explanation. To facilitate understanding of a dataset, we request that data contributors replace code with intelligible text before import.
Description and Explanation: Every field of a dataset must have some narrative description to aid interpretation, even if only a sentence or two. Sometimes certain values in these fields should also be described, especially if data contributors employ terminology that is not widely used by their colleagues.
Structural Relations: Archaeologists often manage their data in relational databases with complex structures. These structures need explanation so that editors can perform the proper queries to extract data. Specifically, we will need to know the primary and secondary (foreign) keys in each table.
Locations and Objects: Open Context creates a separate web page (retrieved at a URL) for every location and object, person, and media file it publishes. It is important to let Open Context editors know which fields represent identifiers for different locations and objects (archaeological sites, archaeological contexts, survey tracks, artifacts, ecofacts). Ideally, some descriptive information should be made available for each identified location and object, including excavation areas or trenches, even if these descriptions are only in narrative form.
Images and Media: Images and other media comprise an important component of archaeological documentation. Each individual media file must be clearly and unambiguously linked to one or more specific records in the dataset (such as records of excavation contexts, people, excavation log records, artifact records, etc). The data contributor should prepare a separate table listing each image file name, an image description (if desired) and the number / identifier of the object or place the image describes.
Abstract and Background: Each project should have a narrative abstract or background description. This should provide introductory information describing the project goals, key findings, methods, and recording systems. For large projects, contributors can also provide additional supplemental background descriptions of specialist analyses. These materials may be submitted in Microsoft Word (or similar) format.
People and Attribution: For citation purposes, every record in Open Context must be attributed to at least one person. In some cases, certain database fields have records of different people who made observations and analyses. Ideally, the people identified in these fields should be identified by full name (not initials) and these names should be spelled properly. In other cases, entire data tables or datasets are created by a single person (such as a specialist). For each data table, please provide a name and institutional affiliation for the person(s) primarily responsible for authorship.
Data Formats and Structures
Data for import should be in Microsoft Excel tables. The first row ("row 1") of the table should contain data field names (columns). The other rows should have the data records in the table, with each data record listed in a separate row. If you do not have Excel or cannot produce Excel spreadsheets from your database, Open Context can also accept Filemaker, Access, and Open Office, as well as comma separated value files. Please note however, that you first must extract images and other media from a database (if stored in binary fields) and store them as individual files.
The project abstract/background should be in Microsoft Word (or a similar format). In addition to the above, you may also provide as much supporting or related documentation as you like, such as PDFs of related publications (with permission), extended bibliographies in Word, and links to related web resources (such as descriptive project web sites, profiles of project participants on their institutional web sites or links to self-archived publications related to the dataset).
Please note, while we accept certain types of data and content in proprietary formats (like Microsoft Excel), our accession and publishing processes transform these data to open formats (chiefly ArchaeoML-XML and CSV). Open, nonproprietary file formats have interoperability and preservation advantages, and the ArchaeoML-XML format enables us to encode more specialized forms of metadata required for archaeological (and related) applications.
Location Information and Site Security
Open Context requires at least one geographic reference for each project. This geographic information should be the most pertinent location information useful for interpretation. This is usually the location of sites. Because Open Context makes all data freely and openly available, data contributors must consider site security issues associated with revealing location data. If location data represents a threat to site security, it should be randomized and only provided publicly at reduced precision. Users should be informed of this manipulation, and provided with contact information for requesting precise location data. Please contact Open Context's editors if you are concerned about site location or other sensity issues.
Copyright and Licensing
Open Context publishes open access, editorially controlled datasets to support future research and instructional opportunities. Thus, data contributors must make their content legally usable by others. To ensure legal reuse, we require that all content be released to the public domain, or that contributors use Creative Commons (creativecommons.org) copyright licenses on their content. We strongly recommend users select the Creative Commons Attribution license. The Attribution license is easy to understand and helps makes contributed data widely useful. While we allow licenses that restrict commercial uses, we recommend against such restrictions. Please be aware that such restrictions are inherently ambiguous and would inhibit important uses, such as inclusion of content in textbooks or even journals distributed through sales. For more background on managing copyright and intellectual property issues, especially with regard to stakeholder communities, please click here.
In the US, copyright applies to expressive works, not compilations of factual information. Therefore, Creative Commons copyright licenses are not appropriate for some datasets, especially those with limited "expressive" content. Datasets that are less expressive and have less authorial voice tend toward a more scientific and factual nature (i.e. those that mainly include physical measurements and adhere to widely used conventions in nomenclature and recording). These datasets should use the Creative Commons-Zero (public domain) dedication.
We encourage contributors to choose a single license to apply to the entire dataset; however, we can also assign different license choices to individual items.
Please note that copyright and licensing issues are largely independent of scholarly citation and attribution. Professional standards dictate that all users properly cite data contributors even for public domain content, especially for scholarly uses. This professional norm works independently of the copyright status of content.
To support open access publication and archiving, Open Context has developed a pricing structure based on a contributor-pays model. Publication fees vary between $250 and $6000 depending on the complexity and size of the contributed database and related content. For example, a single spreadsheet of faunal data, with no related images would cost on the low end of this spectrum. In contrast, a complex project with several databases, specialist analsyses, and thousands of media files, would be on the high-end of this scale. Open Context developers can provide additional fee-based services for implementations based on Open Context's Web-services (API) or other customizations. To assist in budgeting, interested contributors should contact the editorial team (see below) to establish a fee for their specific project.
Grant Seekers (NSF, NEH and Other Granting Programs):
Open Context's Estimation Form helps you budget appropriately for data sharing and generates text you can use for the Data Management Plan section of a NSF or NEH proposal (other granting agencies may wish to see a similar plan). Once you submit the form, you will receive an email with a budget estimate and language to add to your Data Management Plan that addresses access, interoperability, and archiving issues. Click here to use this form.