The OpenCitations Data Model (OCDM) is the metadata model used for the data stored in all the OpenCitations' datasets. It is briefly summarised in Figure 1 and is described in:
Marilena Daquino, Arcangelo Massari, Silvio Peroni, David Shotton (2023). The OpenCitations Data Model. Figshare. https://doi.org/10.6084/m9.figshare.3443876
Marilena Daquino, Silvio Peroni, David Shotton, Giovanni Colavizza, Benham Ghavimi, Anne Lauscher, Philipp Mayr, Matteo Romanello, Philipp Zumstein (2020). The OpenCitations Data Model. In Proceedings of the 20th International Semantic Web Conference (ISWC 2020). https://doi.org/10.1007/978-3-030-62466-8_28.
The OCDM is used to model all the bibliographic and citation entities (i.e. the yellow rectangles in Figure 1, defining the classes of objects the data model allows one to describe), their attributes (i.e. the green arrows) and the relations to other entities (i.e. the blue arrows). All these aspects are exposed in any OpenCitation dataset in RDF, using the 'language' of the Semantic Web, in particular by employing OpenCitations' SPAR (Semantic Publishing and Referencing) Ontologies. Such usage permits the publication of bibliographic and citation data as Linked Open Data (LOD), thereby conferring machine readability and interoperability of the data on the Web. The OCDM may also be employed by third parties, either for their own use or to structure their data for submission to and publication by OpenCitations.
The OCDM allows one to record information about:
published bibliographic resources (class fabio:Expression
in Figure 1) that either cite or are cited by another published bibliographic resources, or that contain citing/cited entities (e.g. a journal containing an article or a book containing chapter);
possible resource embodiments (class fabio:Manifestation
in Figure 1) defining the particular physical or digital format in which a bibliographic resource was made available;
bibliographic references (class biro:BibliographicReference
in Figure 1) usually occurring in the reference list (and usually denoted by one or more in-text reference pointers within a citing bibliographic resource) of a citing entity, that references another bibliographic resource;
responsible agents (class foaf:Agent
in Figure 1), such as people or organizations, having a certain role with respect to a bibliographic resource (e.g. an author of a paper or book, or the publisher of a journal);
the roles (class pro:RoleInTime
in Figure 1) held by an agent with respect to bibliographic resources (e.g. a person being the author of an article and the editor of another book);
the citations (class cito:Citation
in Figure 1) between two bibliographic resources;
the external identifiers (class datacite:Identifier
in Figure 1), such as DOI, ORCID, PubMedID, Open Citation Identifier, associated with the bibliographic entities.
In November 2019, a new release of the OCDM was published, revised and extended with additional kinds of entities that enable the description of in-text reference pointers (class c4o:InTextRefefencePointer
in Figure 1) denoting bibliographic references – i.e. the textual devices (e.g. "[1]" or "Peroni & Shotton 2019") that are embedded in the text of a document within the context of a particular sentence, paragraph or section (which are kinds of discourse elements, defined by the class deo:DiscourseElement
in Figure 1) – and the citations they instantiate (linked via oa:Annotation
in Figure 1), accompanied by a description of their functions, i.e. the reason why a bibliographic resource is cited.
All the entities mentioned above included in the datasets released by OpenCitations are accompanied by provenance information, so as to keep track of the curatorial activities related to each entity, the curatorial agents involved, and the sources used to obtain such data. In addition, OpenCitations also tracks how the data related to its entities may have changed in time, to allow one to reconstruct the particular description status (or snapshot) of an entity at a specified time. This has been technically implemented by extending the Provenance Ontology with a SPARQL-based construct that has been inspired by existing works on change tracking mechanisms in documents created through word-processors such as Microsoft Word and OpenOffice Writer.
For convenience, all the terms of the OCDM described in Figure 1, including those used for keeping track of provenance information, are collected within an ontology called OpenCitations Ontology (OCO). This is not yet another bibliographic ontology, but rather just a place where existing complementary ontological entities from several other ontologies are grouped together for the purpose of providing descriptive metadata described by the OCDM.
All the materials related to the OCDM are available on the OpenCitations GitHub repository.