Open Context's Technology and Archiving

Open Context integrates 100% open-source technologies to publish archaeological data via the Web. These technologies inlcude:

Django (Python): Web application framework
PostgreSQL: primary database
Apache Solr: for fast faceted search and querying
Redis: memory cache for performance
Nginx: Web server

This page provides an initial introduction to Open Context's technology stack and how you can use it and contribute to it.

Data Organization

Internally, Open Context has a highly abstracted and generalized global schema for representing data. This general approach takes its inspiration from the data structure developed by the OCHRE project (originally called "ArchaeoML"). However, to reduce development and maintenance costs, we opted to implement simplified versions of these generalized models in common and easily deployed relational database systems (in our case PostgreSQL).

Over the years, Open Context has evolved from ArchaeoML and moved toward Linked Open Data approaches to data organization. In essense, the current internal data model of Open Context largely looks like a graph-database structure commonly used for RDF triple-stores. However, we do not actually use a triple-store. Our experience managing data from many sources over the past 10 years has shown us that we often need additional attributes describing data provenance, context, etc. We decided that our day-to-day reliance on these additional attributes would make RDF-only triple-stores a bit awkward and cumbersome for our typical information management needs. Thus, Open Context mainly emphasizes RDF and Linked Open Data to relate the data it publishes with the data curated by external sources.

Source Code and Version Control

Open Context is a Python 3 application built on the Django project. Where feasible, we try to keep to "plain vanilla" coding patterns and use of Django components. Because Open Context emphasizes Web interoperability, it uses its own APIs to generate views of individual item records and search results. As described here most of Open Context's APIs provide JSON-LD formatted data.

The source code for Open Context carries a GNU General Public License (Version 3). We use GitHub for software version control, issue tracking, documentation and collaboration. Relevant code repostories are:

Open Context Python Application: the primary software code repository for Open Context. This has source code, deployment instructions, and additional documentation.
Open Context Ontologies / Controlled Vocabularies: Open Context uses a variety of ontologies and controlled vocabularies described in OWL and SKOS. While still incompletely documented, these versions for these vocabularies are tracked in this repository.
Open Context API Client Demos: Open Context provides a variety of powerful Web APIs that others can use for independent data analysis, visualization, or other projects. This repository provides public domain (no copyright) sample javascript and demos illustrating use of these APIs.
Open Context Data Repositories: Open Context has used GitHub for version control of datasets. Currently these repositories provide access to older legacy versions of Open Context data in XML format. We will be updating these shorting to add current data in JSON-LD format once we've finished GitHub API integration.

Digital Archiving

To ensure longevity and long-term citability, Open Context archives data with the University of California's California Digital Library (CDL) Merritt repository and Zenodo, and uses the Internet Archive for image hosting. To promote data citation, Open Context uses the EZID service to issue persistent and globally unique identifiers to individual data records as well as larger aggregations of data. As is common practice with most digital repositories, Open Context issues DOIs (Document Object Identifiers) to facilitate the identification and citation of aggregate datasets. However, unlike most repositories, Open Context's editorial and publishing workflow also includes the creation of persistent identifiers at much more granular and specific levels. To facilitate the identification and citation of specific individual records of data, Open Context uses EZID to mint ARK (Archival Resource Keys) identifiers.

Merritt Repository: location of Open Context content in the Merritt repository.
Zenodo.org: location of Open Context content in the Zenodo repository
Internet Archive: location of Open Context content in the Internet Archive. While the Internet Archive is not a formal digital repository, it does provide additional safeguards for Open Context published digital media. In addition, we gratefully thank the Internet Archive for providing IIIF hosting of the image media we publish.

Icon Credits

Information icon by Jürgen Bauer via the NounProject.com

Digital icon by Ajándi Endre via the NounProject.com

Database icon by Creative Stall via the NounProject.com

Pull-request icon by Nick Bluth via the NounProject.com