Sign in

Working

Register

Working

Working

The linkedarc.net SPARQL interface is hosted by the Apache Jena Fuseki engine (ver 1.1.1). It supports queries that conform to the SPARQL 1.1 W3C recommendation, albeit with the update functionality disabled. SPARQL is by far the most versatile interface into the linkedarc.net project data. In this document, we will introduce the basics of SPARQL querying and set you on the path to constructing more complex queries against linekdarc.net data.

SPARQL queries are built around the RDF triple structure

RDF triples are simple tri-partite structures and as SPARQL is designed to query RDF triplestore data, it makes sense that SPARQL queries are structured with the triple form firmly in mind. RDF triples are made up of a subject, predicate and an object. The predicate defines a relationship that links the subject and object. A simple statement as follows…

‘a dog is a mammal’

…could be represented as the following triple.

A dog triple

Triples are in practice chained together to form more complex data graphs, in which objects in one triple arrangement become subjects in another as follows:

Extending he dog triple

Any and all of the three components of a triple can take the form of a URI. Objects can also be literals such as strings, numbers, dates et cetera.

A SPARQL query basic syntax

SPARQL queries generally come in the following form:

The SPARQL query form

The query begins with a list of prefixes and the URI bases that they point to. Prefixes are useful as URIs can often be quite long and so it makes your query a lot easier to read if they can be replaced through the remainder of the SPARQL query with prefixes that are swapped out by the SPARQL engine before the query is run. A SPARQL prefix statement takes the following form:

PREFIX abc: <http://abcabcabcabc.com/>

In this example, abc is the prefix that is used in the remainder of the query and it points to http://abcabcabcabc.com/.

The next part of the query is where you list for the SPARQL engine the variables whose content you are interested in. A SPARQL variable is defined as a word that is prefixed with a ‘?’ sign. An example of a SELECT statement is as follows:

SELECT ?name ?type WHERE {

The statement begins with the SELECT keyword and this is followed by all of the variables that you are interested in. You can also enter ‘*’ if you are want the engine to return the data contained in all of the variables declared in the query triples section. The SELECT statement is terminated with a WHERE keyword and an opening curly brace opens the query triples section.

The logic of the query is constructed in the triple queries section. SPARQL queries target graph data and this as we have seen is made up of triple structures that are chained together by linking predicates. Each line of this section should contain a triple that can be composed of any combination of variables and fixed values. Here is a very simple example:

?s ?p ?o .

This tells the SPARQL engine that you want it to return every subject, predicate and object grouping contained within the target triplestore. In effect, this returns the entire dataset (note that for certain very large triplestores this query can either take an understandably very long time to return or it can fail to return at all usually due to a timeout). Each triple query is terminated with a period, which means that it will be logically ANDed with the next triple query.

Here is another example:

?s a crmeh:EHE0007_Context .

Now it starts getting interesting. Here you are asking for all of the triples that have an object of crmeh:EHE0007_Context (note the use of the crmeh prefix here) and a predicate of rdf:type (‘a’ is a shortcut for ‘rdf:type’). Notice that we have specified a variable for the subject. We don’t want to specify a fixed value in this case because this is exactly the data that we are looking for. We want the engine to return all of the subjects that are linked to an object with the value crmeh:EHE0007_Context via the rdf:type predicate and to store all of these values in the ?s variable.

We can expand on this query set by adding a second triple query. The complete SPARQL query would read as follows:

PREFIX ecrm: <http://erlangen-crm.org/current/>
PREFIX crmeh: <http://purl.org/crmeh#>

SELECT ?s
WHERE {
	?s a crmeh:EHE0007_Context .
	?s ecrm:P87_is_identified_by '1' .
}

The second query triple narrows down our results set by saying that we only want those subjects that also are linked to a literal value of ‘1’ by the ecrm:P87_is_identified_by predicate. ecrm:P87_is_identified_by is a predicate defined in the CIDOC CRM and it is used to assign an identifier to a resource. The total query can, therefore, be paraphrased as follows. Give me all of your subject URIs that are of type crmeh:EHE0007_Context and that are identified by the text ‘1’.

More advanced SPARQL querying

SPARQL queries can become very complex reflecting the complexity of the graphs that they have to deal with. They can contain numerous triple queries and also more advanced keywords such as FILTER and DATATYPE that narrow down the search results. You can also choose to summarise results data using keywords such as SUM and COUNT and these can be extremely useful in the real world of data mining.

These subject go beyond the scope of this document, however, and if you wish to find out more about advanced SPARQL querying start by having a look here, here and here.