05 January 2008

Jena-Mulgara : example of implementing a Jena graph

In Jena, Graph is an interface. It abstracts anything that looks like RDF - storage options, inference, other legacy data sources.

The main operations are find(Triple), add(Triple) and remove(Triple). In addition, there are a number of getters to access handlers of various features (query, statistics, reification, bulk update, event manager) . Having handlers, rather than directly including all the operations for each feature reduces the size of the interface and makes it easier to provide default implementations of each feature.

Implementing a graph rarely needs to directly implement the interface.  More usually, an implementation starts by inheriting from the class GraphBase.  A minimal (read-only) implementation just needs to implement graphBaseFind. Wrapping legacy data often only makes sense as a read-only graph. To provide update operations, just implement the methods performAdd and performDelete, which are the methods called from the base implementations of add(Triple) and remove(Triple).

Then for testing with JUnit, inherit from AbstractGraphTest (override tests that don't make sense in a particular circumstance) and provide the getGraph operation to generate a graph instance to test.

Application APIs

Graph/Triple/Node provide the low level interface in Jena; Model/Statement/Resource/Literal provide the RDF API and the ontology API provides an OWL-centric view of the RDF data.

Where the graph level is minimal and symmetric (e.g. literal as subjects, inclusion of named variables) for easy implementation, the RDF API enforces the RDF conditions and provides a wide variety of convenience operations so writing a program can be succinct, not requiring the application writer to write unnecessary boilerplate code sequences. The ontology API does the same for OWL.  If you look at the javadoc, you'll see the APIs are large but the system level interface is small.

A graph is turned into a Model by calling ModelFactory.createModelForGraph(Graph). All the key application APIs are interface-based although it's rarely needed to do anything other that use the standard Model-Graph bridge.

Data access to the graph all goes via find. All the read operations of application APIs, directly or indirectly, come down to calling Graph.find or a graph query handler. And the default graph query handler works by calling Graph.find, so once find is implemented everything (read-only) works. ARQ's query API, which includes a SPARQL implementation, included. It may not be the most efficient way but importantly all functionality is available and so the graph implementer can quickly get a first implementation up and running, then decide where and when to spend further development time - or whether that's needed at all.

Jena-Mulgara

An example of this is a prototype Jena-Mulgara bridge (work in progress as of Jan'08). This maps the Graph API to a Mulgara session object, which can be a local Mulgara database or a remote Mulgara server. The prototype is a single class together with a set of factory operations for more convenient creation of a bridge graph wrapped in all Jena's APIs.

Implementing graph nodes, for IRIs and for literals is straight forward.  Mulgara uses JRDF to represent these nodes and to represent triples. Mapping to and from Jena versions of the same is just the change in naming.

Blank nodes are more interesting. A blank node in Jena has an internal label (which is not a URI in disguise). When working at the lowest level of Graph, the code is manipulating things at a concrete, syntactic level.

A blank node in Mulgara has an internal id but it can change. It really is the internal node index as I found out by creating a blank node with id=1 and found it turned into rdf:type which was what was really at node slot 1. Paul has been (patiently!) explaining this to me on a Mulgara mailing list. The session interface is an interface onto the RDF data, not an interface to extend the graph details to the client. Both approaches are valid - it's just different levels of abstraction.

If the Jena application is careful about blank nodes (not assuming they are stable across transactions, and not deleting all triples involving some blank node, then creating triples involving that blank node) then it all works out. The most important case of reading data within a transaction is safe. Bulk loading is better down via the native Mulgara interfaces anyway. The Jena-Mulgara bridge enables a Jena application to access a Mulgara server through the same interfaces as any other RDF data.