Sign in

Working

Register

Working

Working
Creation date Thurs 28 May 2015
Creator flynam
Last modified Thurs 28 May 2015
Data mining pptextiles

The Priniatikos Pyrgos stratigraphy data-mining project exemplifies how you can query the linkedarc.net dataset using its SPARQL endpoint from a 3rd party app. In this case we use Gephi, the graph visualisation tool, to create a representation of the Priniatikos Pyrgos Trench 2 context relationships. The creation of such diagrammatic aids has a long and established tradition in archaeological research and is most commonly attested in the form of a so-called Harris Matrix, named after Edward Harris who first proposed the idea that the stratigraphy of a site could be explained succinctly and clearly by presenting an abstracted vertical view of the site’s stratigraphic units. In this matrix, stratigraphic units are represented as nodes and the linking of one node to another indicates a temporal relationship between the two units.

Harris Matrix example

In the diagram shown above, unit 6 is chronologically the latest unit. It post-dates unit 3, which post-dates unit 2 and so on. Unit 5 post-dates both unit 1 and unit 4 and the matrix bifurcates into two branches, which have their own independent stratigraphic relationships. It is customary to then supplement these the representation of these relationships with an indication of the general chronological groupings of a site. For example, the contexts that fall within the Iron Age are grouped together and so on.

This data-mining exercise first asks linkedarc.net for the chronological data relating to the Priniatikos Pyrgos Trench 2 contexts. This information is then taken by Gephi and used to populate its nodes and edges datatables. Gephi allows you to visualise relationships in a number of different ways and while no one Gephi layout exists to map the sort of upside-down tree arrangement used by the Harris Matric, layouts such as Yifan Hu and Force Atlas allied with the use of colour coding can be used with some degree of success to represent these stratigraphic relationships and to order them into chronological groupings.

Making a SPARQL call using Gephi

SPARQL data can be mined from the linkedarc.net endpoint in a number of different formats. You could for instance request the data as CSV and then import this into Gephi as your dataset. However, there is a more direct way of doing this using the Semantic Web Import plugin (https://marketplace.gephi.org/plugin/semanticwebimport/). Install this plugin and restart Gephi. Now there should be an extra tab alongside the Graph tab called Semantic Web Import. Click on this and under Driver select the ‘Remote – REST endpoint’ option. We need to fill in the details for the SPARQL call here. There is a problem, however, that needs to be addressed. Currently, the linkedarc.net SPARQL endpoint, which is hosted by Apache Jena Fuseki, does not accept SELECT or CONSTRUCT requests delivered using a HTTP POST message while the Semantic Web Import plugin always requests data using HTTP POST. In order to get around this problem, you need to specify the entire SPARQL GET request URL (including the URL-encoded SPARQL query text and the format field) in the ‘Endpoint URL’ field.

Luckily, it’s easy enough to do this. Simply go to the linkedarc.net SPARQL endpoint and write your query. The Semantic Web Import plugin needs to receive a list of RDF triples. It uses these to populate the Gephi project’s nodes and edges datatables. In order to create these triples, you need to employ the SPARQL CONSTRUCT command. If you create triples that contain a predicate prefixed with the Gephi namespace, <http://gephi.org/>, these will be added to the nodes datatable as columns, effectively becoming Gephi attributes, which can then be used to change your network visualisation (by adding colour, for example, as will be discussed below).

Enter this SPARQL query to request from linkedarc.net a list of the context chronological relationships alongside the names of the two contexts and the latest date associated with the first context. The CONSTRUCT section of the query creates a series of RDF triples that represent this information.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gephi: <http://gephi.org/>
PREFIX la_ont: <http://linkedarc.net/ontology/>
PREFIX la_pp_ont: <http://linkedarc.net/ontology/la_pp/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX ecrm: <http://erlangen-crm.org/current/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX crmeh: <http://purl.org/crmeh#>

CONSTRUCT {
	?context gephi:label ?contextName . ?contextBelow gephi:label ?contextBelowName .
  	?context gephi:periodEnd ?periodEndMax .
	?context la_pp_ont:P13_matrix_isover ?contextBelow .
}
WHERE  {
      SELECT ?context ?contextName ?contextBelow ?contextBelowName (MAX(?periodEnd) AS ?periodEndMax) {
	?context a crmeh:EHE0007_Context .
  	?context ecrm:P89_falls_within ?trench .
  	?trench a crmeh:EHE0088_SiteSubDivisionDepiction .
  	?trench ecrm:P87_is_identified_by "2" .
	?context ecrm:P7i_witnessed ?contextEvent .
  	?contextEvent ecrm:P120i_occurs_after ?contextEventBelow .
	?contextBelow ecrm:P7i_witnessed ?contextEventBelow .
	?context rdfs:name ?contextName .
	?contextBelow rdfs:name ?contextBelowName .
  	?context ecrm:P26_moved_to ?contextFindDepEvent .
  	?contextFindDepEvent ecrm:P25_moved ?contextFind .
  	?contextFind ecrm:P16i_was_used_for ?contextFindUseEvent .
  	?contextFindUseEvent ecrm:P4_has_time-span ?period .
  	?period ecrm:P87_is_identified_by ?periodName .
  	?period ecrm:P80_end_is_qualified_by ?periodEnd .
	} GROUP BY ?context ?contextName ?contextBelow ?contextBelowName 
}

Select ‘Text’ as the request output and click ‘Get Results’ to initiate the query. The server will return a Turtle RDF file encoding the relevant information. Copy the URL contained in the address bar and return to Gephi.

Paste the address into the ‘Endpoint URL’ field. Now we need to change the requested return type to XML (The Semantic Web Import plugin expects data as RDF/XML). Find in the address string where it reads ‘output=text’. Change this to read ‘output=xml’ instead. Now you are ready to run the query. Click the ‘Run’ button and then wait a moment for the query to execute. Check the Context inspector. This should update to indicate the amount of new nodes and edges (relationships) that have been added to the graph. If either of these read ‘0’, then check that you have entered the URL correctly and run the query again.

Formatting the Gephi graph’s layout

Go to the Gephi Graph panel. Your data should now be represented as a series of nodes and links on a white background. They are displayed using the default layout. We want to change this to use a layout that better visualises the semantics of the data. I have found that the ‘Yifan Hu Proportional’ layout works best to render a stratigraphic network such as this. Select this layout from the list of layouts (if it is not listed, you may need to install is as a plugin). Click Run and the layout will change. Now the most connected nodes become hubs near the centre of the display and the less connected nodes are pushed to the periphery.

Currently, our graph does not display any labels. In order to do this, click on the small icon located in the bottom-right-hand corner of the graph and go to the Labels tab. Turn on the Node option and you should see that labels have now been added to your graph. The size of these labels might be a problem. You can choose to use ‘Fixed’, ‘Scaled’ or ‘Node Size’ to correct this.

Colour-coding the edges using Gephi attributes

Recall how we also requested period information from the linkedarc.net server. We want to represent this information as a part of our graph. In order to do this, click on the Partition inspector and click the refresh button. From the dropdown list that is created, select periodEnd and click Apply. This colour-codes the edges linking each of the nodes based on their latest associated date.

Conclusion

And that’s it. You now have a graph diagram that represents the stratigraphic information of Trench 2 at Priniatikos Pyrgos. You can now export this graph as a PDF, PNG or SVG file. If you export the graph as an SVG, you can edit it some more in Adobe Illustrator as I have done below.

Finished graph visualisation