Data Model

The OpenCitations Data Model (OCDM) is the metadata model used for the data stored in all the OpenCitations' datasets. It is briefly summarised in Figure 1 and is described in:

Marilena Daquino, Silvio Peroni, David Shotton (2020). The OpenCitations Data Model. Figshare.

Marilena Daquino, Silvio Peroni, David Shotton, Giovanni Colavizza, Benham Ghavimi, Anne Lauscher, Philipp Mayr, Matteo Romanello, Philipp Zumstein (2020). The OpenCitations Data Model. In Proceedings of the 20th International Semantic Web Conference (ISWC 2020).

The Graffoo diagram of the main ontological entities described by the OCC metadata model.

Figure 1. The Graffoo diagram of the main ontological entities described in the OCDM.

The OCDM is used to model all the bibliographic and citation entities (i.e. the yellow rectangles in Figure 1, defining the classes of objects the data model allows one to describe), their attributes (i.e. the green arrows) and the relations to other entities (i.e. the blue arrows). All these aspects are exposed in any OpenCitation dataset in RDF, using the 'language' of the Semantic Web, in particular by employing OpenCitations' SPAR (Semantic Publishing and Referencing) Ontologies. Such usage permits the publication of bibliographic and citation data as Linked Open Data (LOD), thereby conferring machine readability and interoperability of the data on the Web. The OCDM may also be employed by third parties, either for their own use or to structure their data for submission to and publication by OpenCitations.

The OCDM allows one to record information about:

In November 2019, a new release of the OCDM was published, revised and extended with additional kinds of entities that enable the description of in-text reference pointers (class c4o:InTextRefefencePointer in Figure 1) denoting bibliographic references – i.e. the textual devices (e.g. "[1]" or "Peroni & Shotton 2019") that are embedded in the text of a document within the context of a particular sentence, paragraph or section (which are kinds of discourse elements, defined by the class deo:DiscourseElement in Figure 1) – and the citations they instantiate (linked via annotations, defined by the class oa:Annotation in Figure 1), accompanied by a description of their functions, i.e. the reason why a bibliographic resource is cited.

All the entities mentioned above included in the datasets released by OpenCitations are accompanied by provenance information, so as to keep track of the curatorial activities related to each entity, the curatorial agents involved, and the sources used to obtain such data. In addition, OpenCitations also tracks how the data related to its entities may have changed in time, to allow one to reconstruct the particular description status (or snapshot) of an entity at a specified time. This has been technically implemented by extending the Provenance Ontology with a SPARQL-based construct that has been inspired by existing works on change tracking mechanisms in documents created through word-processors such as Microsoft Word and OpenOffice Writer.

For convenience, all the terms of the OCDM described in Figure 1, including those used for keeping track of provenance information, are collected within an ontology called OpenCitations Ontology (OCO). This is not yet another bibliographic ontology, but rather just a place where existing complementary ontological entities from several other ontologies are grouped together for the purpose of providing descriptive metadata described by the OCDM.

All the materials related to the OCDM are available on the OpenCitations GitHub repository.