13 April 2006

From RDQL to SPARQL

SPARQL is now (April 2006) a W3C Candidate Recommendation which means it is stable enough for wide spread implementation.  Actually, there are quite a few implementations already (SPARQL implementations page on ESW wiki).

SPARQL is defined by three documents:

and there are tutorials like this one.

RDQL predates SPARQL - in fact, RDQL design predates the current RDF specifications and some of the design decisions in RDQL are a reflection of that. The biggest of these is that RDF didn't have any datatyping so RDQL handles tests on, say, integers without checking the datatype (if it looks like an integer, it can be tested as integer).

SPARQL has all the features of RDQL and more:

  • ability to add optional information to query results
  • disjunction of graph patterns
  • more expression testing (date-time support, for example)
  • named graphs
  • sorting

but, above all, it is more tightly specified so queries in one implementation should behave the same in all other implementations.

ARQ - A Query Engine for Jena

In parallel with the developing the SPARQL specification, I have been developing a new query subsystem for Jena called ARQ. ARQ is now part of the Jena download.

ARQ builds on top of the existing Jena query support for matching of basic graph patterns (BGPs are the building block in SPARQL). ARQ can execute SPARQL and RDQL as well as an extended form of SPARQL. It has several extension points, such as Property functions. The ARQ query engine works with any Jena Graph or Model.

Converting RDQL code to SPARQL code

The functionality of RDQL is a subset of SPARQL so it's not hard to convert RDQL queries to SPARQL. What needs to be done is convert the triple syntax and convert any constraints.

Syntax

SPARQL syntax uses a Turtle-like syntax which is familiar to anyone knowing N3. Namespaces go at the start of the query, not after like USING. There are no () around triple patterns; instaead there is a "." (a single dot) between triple patterns. An RDQL only ever has one graph pattern, in SPARQL, blocks of triple patterns are delimited by {}

You can even use the command line arq.qparse to read in an RDQL query and write out the SPARQL query but it's a rough approximation you'll need to check and it may not be completely legal SPARQL.

Constraints

The constraints need the most care because SPARQL uses RDF datatyping and RDQL doesn't. Some common areas are:

  • regular expressions
  • string equality and numeric equality

Regular expressions

A SPARQL regular expression looks like:

regex(expr, "pattern")
regex(expr, "pattern", "i")

The catch is that the expr must be a literal; it can't be a URI. (Well - it can, but it will never match!). If you want to perform a regular expression match on a URI, use the str() built-in to get the string form of the URI.

regex(str(?uri), "^http://example/ns#")

Equality

RDQL has = for numeric equality and eq for string equality. A number in RDQL was anything that can be parsed an a number, whether it had a datatype or not (or even the wrong datatype). Likewise, anything could be treated as a string (like URIs in regular expressions).

SPARQL has = which is taken from XQuery/XPath Functions and Operators. It decides whether that is numeric equals, string equals or URI-equals based on the kind of arguments it is given.

API Changes

The ARQ API is in the package com.hp.hpl.jena.query. The RDQL API is deprecated, starting with Jena 2.4. The new API is similar in style to the old one for SELECT, with iteration over the rows of the results (javadoc). Differences include the widespread use of factories, naming consistent with the SPARQL specifications, and different exec operations for the different kinds of SPARQL query. QueryExecution objects should be properly closed.

One change is that to get the triples that matched a query, instead of asking the binding for the triples that were used in the matching, the application should now make a CONSTRUCT query.

Experimenting with SPARQL

There is a set of command line utilities to try out SPARQL queries from the command line.

A nice graphical interface is twinkle by Leigh Dodds.

There is also an implementation of the SPARQL protocol using ARQ, project Joseki, and a demo site at http://www.sparql.org where you can validate SPARQL queries and try them out.

Questions?

Send question and comments about ARQ to jena-dev.

Send general questions and comments about SPARQL to the W3C list sparql-dev (archive).

If you have experiences converting from RDQL to SPARQL, then let me know and I'll compile a list of common issues.

6 comments:

/\/ikolaj said...

I could really use the comment on having to convert a URI to a strings before being able to use it in a regexp FILTER.

I wonder, though, if it is not possible to make a URI match without using FILTER, e.g.:

SELECT ?d WHERE { ?d foo:uri <some:uri> }

Rather, I wonder how to make the above work, because I don't see why it should not work. The following does work (for some example RDF I have in a jena model querying using ARQ), using your str() tip in a filter:

SELECT ?d WHERE { ?d foo:uri ?uri
FILTER regex(str(?uri),"some:uri") }

AndyS said...

Graph pattern matching is exact match so placing the URI in a triple pattern will work when you are matching the whole URI.

It is like ""^http://example/ns#xyz$" using ^$ to anchor the string.

The example is matching
"^http://example/ns#"
which is the namespace of the prefix of a URI. It matches anything beginning with that string like "http://example/ns#xyz" and "http://example/ns#abcd1234"

/\/ikolaj said...

some:uri is a the whole URI in my example. There is no match when I use it in

SELECT ?d WHERE { ?d foo:uri some:uri }

or

SELECT ?d WHERE { ?d foo:uri <some:uri> }

or

SELECT ?d WHERE { ?d foo:uri 'some:uri' }

I have tried all combinations I can think of.

I only get a match when I do

SELECT ?d WHERE { ?d foo:uri ?uri
FILTER regex(str(?uri),"^some:uri$") }

'some:uri' is the actual string I have as URI for my rdf resource.

Btw. do is there a mailing list where this should move to?

/\/

/\/ikolaj said...
This comment has been removed by a blog administrator.
/\/ikolaj said...

The RDF I have looks like (hand-edited so might contain parse errors):

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.0="file://foo.owl#" >
<rdf:Description rdf:about="some:uri">
<rdf:type rdf:resource="file:///foo.owl#Document"/>
<j.0:firstNames rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Nikolaj</j.0:firstNames>
</rdf:Description>
<rdf:Description rdf:about="xx:meta1">
<j.0:document rdf:resource="some:uri"/>
<rdf:type rdf:resource="file://foo.owl#MetaDoc"/>
</rdf:Description>
</rdf:RDF>

AndyS said...

This is better done by email on the jena-dev list. Direct email (and we can summarise here later) if you must. The HTML is messing up data because it looks like HTML tags.

Your data contains strange URIs:
<some:uri> but the query:

PREFIX : <file://foo.owl#>

SELECT *
{ ?x :document <some:uri> }

works.

Unrelated:
If you use RDF/XML_ABBREV as the writer, you get nicer looking RDF/XML.