16 February 2006

Property Functions in ARQ

These are properties that are calculated by some custom code, and not done by the usual matching. There are two provided now: applications are free to provide application-specifc ones.

  • list:member - access the members of an RDF list.
  • rdfs:member - access the members of rdf:Bag, rdf:Seq and rdf:Alt structures

Full Extension documentation

Normally, the unit of matching in ARQ is the basic graph pattern (a sequence of triple patterns). These sets of triple patterns are dispatched to Jena for matching by Jena's graph-level query handler. Each kind of storage provides the appropriate query handler. For example, the database fastpath is a translation of a set of triple patterns into a single SQL query involving joins.

There is also a default implementation that works by using plain graph find (a triple with possible wildcards) so a new storage system does not need to provide it's own query handler until it wants to exploit some feature of the storage.

If a function property is encountered, then it is internally treated as a call to be an extension. There is a registry of function properties to implementing code.

  # Find all the members of a list (RDF collection)
  PREFIX  list:   <http://www.jena.hpl.hp.com/ARQ/list#>
  SELECT ?member
  { ?x :p ?list .
    ?list list:member ?member .
  }

The functionality of list:member is handled by a class in the extension library so this query is treated much like the ARQ extension:

  # Find all the members of a list (RDF collection)
  PREFIX  ext: <java:com.hp.hpl.jena.query.extension.library.>
  SELECT ?member
  { ?x :p ?list .
    EXT ext:list(?list, ?x)
  }

where ext:list is a function that bind its arguments (unlike a FILTER function). The property function form is legal SPARQL.

So, this mechanism shows that collection access can be done in SPARQL without resorting to handling told blank nodes.

cwm (which is a forward chaining rules engine) and Euler (which is a backward-chaining rules engine) already provide this style of access. Their property is - the subject and object meanings are the other way.

ARQ provides list:member to be like rdfs:member.

— Andy

11 February 2006

Progress with Jena.Net

[GNU] wrote about building Jena for Mono, using IKVM to compile the jar files into .Net IL.

This approach means that the same source code is used for both the Java world and the .Net world, making future improvements visible to both from a single source tree.

I tried doing it for .Net on Windows with C# Express and IKVM-0.24.0.1.

Summary

SPARQL queries work.

Using Jena from C# works for small scale cases - lots of checking to do but it should be a matter of verifying everything from the dependent libraries works properly.

Some things aren't working but there are a few hotspots of trouble that, when fixed, mean that the majority (may be all) of the Jena test suite will run. As it is at the moment, quite a lot can be done including using the ARQ command line programs.

The Conversion

The IKVM bytecode conversion route is my preferred choice because it means one source codebase, not two. When I tried this before, I got an early version of ARQ up and running. But it wasn't complete; the first big block was the lack of java.nio.charset support in GNU Classpath. Jena and ARQ have lots of tests of internationalization and charsets. That alone was enough to make it not worthwhile exploring further at the time.

Now (Feb 2006) GNU Classpath coverage is much better. See the coverage of GNU Classpath compared to Java 1.4.

The process is simple: run ikvmc on all the jars to get a library. Ignore all the warnings about missing stuff. It's surprising what various libraries actually reference - Log4j has references to a lot of log record transports. At the simplest:

ikvmc *.jar -out:XXX.dll -target:library

I've now broken this in two DLLs: jena-libs.dll (all the jars except the jena ones) and jena.dll (jena.jar, jenatest.jar, arq.,jar, iri.jar) but that is just because I keep building the DDLs while testing.

It takes a minute or so (less time than building jena.jar itself). The result is two DLLs of about totaling 16M - the whole assembly is about 23M including the three IKVM DLLs. Not small - but it works and it is simple to do.

What's been tried: in-memory graphs, reading and writing turtle files (but XML types literals broken) and SPARQL queries.

Jena bugs: (this is relative to CVS and so after Jena 2.3)

  • file:///c:/absolute was incorrectly turned into a windows filename. Worked OK with Sun's Java but not IKVM. Fixed.

GNU Classpath bugs:

  • InputStreamReader(InputStream, Charset) is broken although the other two constructors that allow the charset conversion to be explicitly controlled do seem to work. This can be worked around in Jena. Bugzilla Entry.
  • Zero-width lookbehind regexs aren't implemented. They are used by JJC's new IRI code. Bugzilla Entry.

ARQ Test Suite

As a rough comparision, I ran the ARQ test suite:

BEfore any fixes, with Java 5 JVM:
Tests run: 1119, Failures: 0, Errors: 0

Using ikvm as the JVM:
Tests run: 1119, Failures: 32, Errors: 17

Converting to .Net:
Tests run: 1119, Failures: 32, Errors: 59

[20 Feb: JJC recoded around the lack of lookbehind and now its down to 4 failures of which 3 are because GNUClasspath is just different to Sun's runtime]

Next

Now it's work through the broken tests in the ARQ test suite to determine what's the cause as time permits.

IronPython to Jena?

Updates

  • Calling Jena from VB.Net works
  • The GNU Classpath/InputStreamReader bug has been fixed
  • The GNU Classpath/lookbehind bug had already been fixed but very recently so IKVM hasn't picked it up yet.

Now 4 failures, 3 of which are corner case differences of URI resolution in unusual cases.