05 January 2008

Jena-Mulgara : example of implementing a Jena graph

In Jena, Graph is an interface. It abstracts anything that looks like RDF - storage options, inference, other legacy data sources.

The main operations are find(Triple), add(Triple) and remove(Triple). In addition, there are a number of getters to access handlers of various features (query, statistics, reification, bulk update, event manager) . Having handlers, rather than directly including all the operations for each feature reduces the size of the interface and makes it easier to provide default implementations of each feature.

Implementing a graph rarely needs to directly implement the interface.  More usually, an implementation starts by inheriting from the class GraphBase.  A minimal (read-only) implementation just needs to implement graphBaseFind. Wrapping legacy data often only makes sense as a read-only graph. To provide update operations, just implement the methods performAdd and performDelete, which are the methods called from the base implementations of add(Triple) and remove(Triple).

Then for testing with JUnit, inherit from AbstractGraphTest (override tests that don't make sense in a particular circumstance) and provide the getGraph operation to generate a graph instance to test.

Application APIs

Graph/Triple/Node provide the low level interface in Jena; Model/Statement/Resource/Literal provide the RDF API and the ontology API provides an OWL-centric view of the RDF data.

Where the graph level is minimal and symmetric (e.g. literal as subjects, inclusion of named variables) for easy implementation, the RDF API enforces the RDF conditions and provides a wide variety of convenience operations so writing a program can be succinct, not requiring the application writer to write unnecessary boilerplate code sequences. The ontology API does the same for OWL.  If you look at the javadoc, you'll see the APIs are large but the system level interface is small.

A graph is turned into a Model by calling ModelFactory.createModelForGraph(Graph). All the key application APIs are interface-based although it's rarely needed to do anything other that use the standard Model-Graph bridge.

Data access to the graph all goes via find. All the read operations of application APIs, directly or indirectly, come down to calling Graph.find or a graph query handler. And the default graph query handler works by calling Graph.find, so once find is implemented everything (read-only) works. ARQ's query API, which includes a SPARQL implementation, included. It may not be the most efficient way but importantly all functionality is available and so the graph implementer can quickly get a first implementation up and running, then decide where and when to spend further development time - or whether that's needed at all.

Jena-Mulgara

An example of this is a prototype Jena-Mulgara bridge (work in progress as of Jan'08). This maps the Graph API to a Mulgara session object, which can be a local Mulgara database or a remote Mulgara server. The prototype is a single class together with a set of factory operations for more convenient creation of a bridge graph wrapped in all Jena's APIs.

Implementing graph nodes, for IRIs and for literals is straight forward.  Mulgara uses JRDF to represent these nodes and to represent triples. Mapping to and from Jena versions of the same is just the change in naming.

Blank nodes are more interesting. A blank node in Jena has an internal label (which is not a URI in disguise). When working at the lowest level of Graph, the code is manipulating things at a concrete, syntactic level.

A blank node in Mulgara has an internal id but it can change. It really is the internal node index as I found out by creating a blank node with id=1 and found it turned into rdf:type which was what was really at node slot 1. Paul has been (patiently!) explaining this to me on a Mulgara mailing list. The session interface is an interface onto the RDF data, not an interface to extend the graph details to the client. Both approaches are valid - it's just different levels of abstraction.

If the Jena application is careful about blank nodes (not assuming they are stable across transactions, and not deleting all triples involving some blank node, then creating triples involving that blank node) then it all works out. The most important case of reading data within a transaction is safe. Bulk loading is better down via the native Mulgara interfaces anyway. The Jena-Mulgara bridge enables a Jena application to access a Mulgara server through the same interfaces as any other RDF data.

4 comments:

Unknown said...

I attempted to use JenaMulgara and got an exception. It seems that the PatternMulgara.compileTriplePattern () is getting a list of variables but one of the variables is not on that list. The var_jr list has one item and it has the label "Y"

We are calling OntModel.getAllClasses() and
the requesting the first item from the Iterator.

Good luck.

Throwable in JenaDataSourceSetup:
com.hp.hpl.jena.shared.JenaException: Failed to compile: (http://vitro.mannlib.cornell.edu/ns/vitro/0.7#TabIndividualRelation http://www.w3.org/2000/01/rdf-schema#subClassOf ?X)
at com.hp.hpl.jena.mulgara.PatternMulgara.compileTriplePattern(PatternMulgara.java:115)
at com.hp.hpl.jena.mulgara.PatternMulgara.prepareBindings(PatternMulgara.java:78)
at com.hp.hpl.jena.ontology.impl.OntClassImpl.(OntClassImpl.java:129)
at com.hp.hpl.jena.ontology.impl.OntClassImpl$1.wrap(OntClassImpl.java:77)
at com.hp.hpl.jena.enhanced.Personality.newInstance(Personality.java:84)
at com.hp.hpl.jena.enhanced.EnhGraph.getNodeAs(EnhGraph.java:127)
at com.hp.hpl.jena.ontology.impl.OntModelImpl$SubjectNodeAs.map1(OntModelImpl.java:3056)
at com.hp.hpl.jena.util.iterator.Map1Iterator.next(Map1Iterator.java:33)
at com.hp.hpl.jena.util.iterator.WrappedIterator.next(WrappedIterator.java:68)
at com.hp.hpl.jena.util.iterator.UniqueExtendedIterator.nextIfNew(UniqueExtendedIterator.java:61)
at com.hp.hpl.jena.util.iterator.UniqueExtendedIterator.hasNext(UniqueExtendedIterator.java:69)
at edu.cornell.mannlib.vitro.webapp.dao.jena.WebappDaoFactoryJena.makeFlag2ConvenienceMaps(WebappDaoFactoryJena.java:195)
at edu.cornell.mannlib.vitro.webapp.dao.jena.WebappDaoFactoryJena.(WebappDaoFactoryJena.java:152)
at edu.cornell.mannlib.vitro.webapp.dao.jena.WebappDaoFactoryJena.(WebappDaoFactoryJena.java:156)
at edu.cornell.mannlib.vitro.webapp.servlet.setup.JenaDataSourceSetup.contextInitialized(JenaDataSourceSetup.java:80)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:3830)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4337)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525)
at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:920)
at org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:883)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:492)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:719)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
at org.apache.catalina.core.StandardService.start(StandardService.java:516)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
at org.apache.catalina.startup.Catalina.start(Catalina.java:566)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)

AndyS said...

Thanks for the report! I've identified why that particular case is not handled properly and there's a fix in SVN. Please let me know if it works for you.

Unknown said...

Thanks for the quick response. Your fix worked and I can now access a mulgara model from jena.

What is the difference between the two approaches described on these pages:

http://docs.mulgara.org/integration/jena.html

verses

http://jena.hpl.hp.com/wiki/JenaMulgara

(Other than the first seems to no longer work in jena 2.5.5)

AndyS said...

The one from Mulgara is very old and, as far as I know, no longer maintained. I don't know why it's still in their documentation. The approach is takes does not allow for efficient access (it's single triple based).

One of the reasons I started this project is to create a Jena->Mulgara connector that was current. Because it hooks into Jena as a graph and because it uses Mulgara's APIs, it should be more stable.