Sign in

Working

Register

Working

Working

The CIDOC Conceptual Reference Model (often abbreviated to simply the CRM or the ECRM after the University of Erlangen OWL-encoded de facto standard reference version of the model) is by far and away the most popular ontology used to model cultural heritage data. The ontology came out of the work of the CIDOC Documentation Standards Group. This document will describe the objectives of the model, how it is structured and it will also proffer some comments as to how it might be implemented.

The CRM’s objectives

The CRM’s stated goal is to provide ‘a common and extensible semantic framework that any cultural heritage information can be mapped to’. Essentially, it aims to facilitate the exchange and sharing of information.

The important aspects to take out of this official statement are the ideas of commonality, extensibility and semantics. Commonality emphasises the objective of gluing together data that comes out of the museums, libraries and archives section. Extensibility of the model is important to the way that it is structured. While public ontologies are important for the reasons that are discussed in the introduction to ontologies document, it is also accepted by most data modellers that individual data providers will always demand a degree of flexibility in any model’s final implementation. This flexibility is allowed for by extensions and we will discuss in greater detail one of the CRM’s extensions, which was created by English Heritage to model archaeological data, in the next section. And finally, the CRM is unsurprisingly devoted to the promulgation of semantics within the cultural heritage sector. Semantics provide the framework that makes data understandable and, therefore, useful.

The design principles of the CRM

The CRM is built upon an object-orientated design. Object-orientated approaches are common in the Computer Science world but perhaps less so in the world of cultural heritage data modelling. The OO approach is preferred to the flatter Entity-Relationship approach because it more naturally allows for extensibility. It is also far easier to make sense of even when it is built to encode complex meaning. The CRM is agnostic when it comes to data serialisation and hosting. In theory it can be serialised in any form that allows for the representation of its structure but in practice Linked Data approaches tend to fit well with its design and objectives.

The CRM is fairly generic in terms of the types of classes that it exposes. It is intended that these be extended so that they can fit the application in mind. It is what is known as poly-hierarchical. This means that it allows for the sub-classing of classes, a very common concept for anyone who has worked with OO models before. Poly-hierarchical structures allow for polymorphism, which means that if Class B is a subclass of Class A and if Class A has the property ‘name’, then so does Class B.

The CRM tends to be serialised as RDFS or OWL. Both of these formats allow for the writing down of the ontological structures, i.e. how classes and properties relate to other classes and properties within the model. This explicit serialisation of the CRM structure can be fed into reasoning engines and be used to allow for logical inferences. For example, if John is the father of Mike and we know from our ontology that ‘father of’ has the inverse property ‘son of’, then a reasoning engine can use this information to infer that Mike is the son of John.

Structurally, the CRM contains what it calls entities and properties that relate to the former. Entities can be understood to be analogous to ontological classes. They are broken down into two broad groups: persistent and temporal entities. The former are usually mapped to real world objects that can survive events. For example, a chair exists beyond its being painted. Temporal entities are phenomena that come into and go out of being with the arrival and departure of an event. The painting event of the chair could be mapped to a temporal entity.

Time is going to be an important aspect of any ontology that deals with cultural heritage data, whether that be relatively recent time in the form of the time it takes to move a painting from one collection to another or time at the a scale of the longue durée when we are talking about centuries or millennia. Entities in themselves cannot be directly linked to time. They do so instead by being linked to Temporal Entities, which can be linked to time.

Other foundational entities in the CRM are Actors, which represent people or groups, Places, which represent locations, Physical and Conceptual Things. Appellations are a common entity within the framework. They allow for the labelling of other entities or events. Finally, the Type entity allows for other entities to categorized, which is key to the operation of any cultural heritage datastore.

On implementing the CRM

As mentioned already, the CRM does not have to be implemented using Linked Data approaches but in practice it is preferable to do so and the majority of current applications of the model follow this approach. The philosophy of Linked Data tends to be implemented using Resource Description Framework, which is built upon the simple unit of the triple, a tripartite structure that allows for the linking of two nodes of information. As such, the RDF triple is a natural fit to realise the CRM’s entity and relationship structure.

RDF triplestores also tend to be accompanied by SPARQL interfaces. SPARQL is a querying language that allows for the interrogation of RDF resources. SPARQL queries can and usually do involve the chaining of logical search clauses. This can in theory allow for the construction of very complex questions as seen below, in which the linkedarc.net triplestore is asked to return the names and geolocations of all of the archaeological contexts that contain fine ceramic wares.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX crmeh: <http://purl.org/crmeh#>
PREFIX ecrm: <http://erlangen-crm.org/current/>
PREFIX la_vocabs: <http://linkedarc.net/vocabs/>

SELECT ?name ?geo
WHERE {
    ?find a crmeh:EHE0009_ContextFind .
	?find ecrm:P2_has_type la_vocabs:ware-fine .
	?find ecrm:P89_falls_within ?context .
	?context a crmeh:EHE0007_Context .
	?context rdfs:name ?name .
	?context ecrm:P87_is_identified_by ?geo .
	FILTER (DATATYPE(?geo) = crmeh:EHE0022_ContextDepiction)
}

The CRM documentation highlights from the beginning that the model allows for the mapping of cultural heritage information onto a shared and common framework. Mapping is, therefore, a key ingredient in the application of the CRM and in effect it will consume the majority of the effort of the overall project.

It is possible to create RDF data by hand – the many text-based RDF serialisations allow for this – but in practice the first thing that you need to do when mapping any reasonable amount of data onto the CRM is to find a bulk data cleaning and mapping tool to carry out this task in a reliable and repeatable fashion. OpenRefine is certainly one way of doing this and when used in conjunction with the RDF Refine Extension the process is not as daunting as it might first appear.

OpenRefine allows you to import data in a number of different formats, although it is preferable to do so using CSS. It contains functionality that allows for the cleaning up and normalisation of data, which are both key facets of the mapping process. It also supports a scripting language called GREL that allows for greater control in this regard.

OpenRefine

The RDF Refine extension allows you to specify how you would like your input data to be mapped onto a model such as the CRM as shown below.

OpenRefine RDF extension

An important thing to note is that no one dataset is intended to be mapped onto the entirety of the CRM. You will undoubtedly find that your dataset cannot be mapped exactly as is specified in the CRM’s documentation and in those situations you will either have to include a separate ontology or build your own CRM extension to fill in the semantic gaps.

Next learn about the English Heritage extension to the CRM.