SPARQL is now (April 2006) a W3C Candidate Recommendation which means it is stable enough for wide spread implementation. Actually, there are quite a few implementations already (SPARQL implementations page on ESW wiki).
SPARQL is defined by three documents:
and there are tutorials like this one.
RDQL predates SPARQL - in fact, RDQL design predates the current RDF specifications and some of the design decisions in RDQL are a reflection of that. The biggest of these is that RDF didn't have any datatyping so RDQL handles tests on, say, integers without checking the datatype (if it looks like an integer, it can be tested as integer).
SPARQL has all the features of RDQL and more:
- ability to add optional information to query results
- disjunction of graph patterns
- more expression testing (date-time support, for example)
- named graphs
- sorting
but, above all, it is more tightly specified so queries in one implementation should behave the same in all other implementations.
ARQ - A Query Engine for Jena
In parallel with the developing the SPARQL specification, I have been developing a new query subsystem for Jena called ARQ. ARQ is now part of the Jena download.
ARQ builds on top of the existing Jena query support for matching of basic graph patterns (BGPs are the building block in SPARQL). ARQ can execute SPARQL and RDQL as well as an extended form of SPARQL. It has several extension points, such as Property functions. The ARQ query engine works with any Jena Graph or Model.
Converting RDQL code to SPARQL code
The functionality of RDQL is a subset of SPARQL so it's not hard to convert RDQL queries to SPARQL. What needs to be done is convert the triple syntax and convert any constraints.
Syntax
SPARQL syntax uses a Turtle-like syntax which is familiar to anyone knowing N3.
Namespaces go at the start of the query, not after like
USING
. There are no ()
around triple
patterns; instaead there is a ".
" (a single dot)
between triple patterns. An RDQL only ever has one graph pattern,
in SPARQL, blocks of triple patterns are
delimited by {}
You can even use the command line
arq.qparse
to read in an
RDQL query and write out the SPARQL query but it's a rough approximation
you'll need to check and it may not be completely legal SPARQL.
Constraints
The constraints need the most care because SPARQL uses RDF datatyping and RDQL doesn't. Some common areas are:
- regular expressions
- string equality and numeric equality
Regular expressions
A SPARQL regular expression looks like:
regex(expr, "pattern") regex(expr, "pattern", "i")
The catch is that the expr
must be a literal; it can't be a URI.
(Well - it can, but it will never match!). If you want to perform a regular
expression match on a URI, use the
str()
built-in to get the string form of the URI.
regex(str(?uri), "^http://example/ns#")
Equality
RDQL has =
for numeric equality and eq
for
string equality. A number in RDQL was anything that can be parsed an a number, whether
it had a datatype or not (or even the wrong datatype). Likewise, anything could be
treated as a string (like URIs in regular expressions).
SPARQL has =
which is taken from
XQuery/XPath Functions and Operators. It decides whether that is numeric equals,
string equals or URI-equals based on the kind of arguments it is given.
API Changes
The ARQ API is in the package com.hp.hpl.jena.query
. The RDQL API
is deprecated, starting with Jena 2.4. The new API
is similar in style to the old one for SELECT,
with iteration over
the rows of the results (javadoc).
Differences include the widespread use of factories, naming consistent with the
SPARQL specifications, and different exec
operations for the
different kinds of SPARQL query. QueryExecution
objects should be
properly closed.
One change is that to get the triples that matched a query, instead of asking the binding for the triples
that were used in the matching, the application should now make a
CONSTRUCT
query.
Experimenting with SPARQL
There is a set of command line utilities to try out SPARQL queries from the command line.
A nice graphical interface is twinkle by Leigh Dodds.
There is also an implementation of the SPARQL protocol using ARQ, project Joseki, and a demo site at http://www.sparql.org where you can validate SPARQL queries and try them out.
Questions?
Send question and comments about ARQ to jena-dev.
Send general questions and comments about SPARQL to the W3C list sparql-dev (archive).
If you have experiences converting from RDQL to SPARQL, then let me know and I'll compile a list of common issues.
6 comments:
I could really use the comment on having to convert a URI to a strings before being able to use it in a regexp FILTER.
I wonder, though, if it is not possible to make a URI match without using FILTER, e.g.:
SELECT ?d WHERE { ?d foo:uri <some:uri> }
Rather, I wonder how to make the above work, because I don't see why it should not work. The following does work (for some example RDF I have in a jena model querying using ARQ), using your str() tip in a filter:
SELECT ?d WHERE { ?d foo:uri ?uri
FILTER regex(str(?uri),"some:uri") }
Graph pattern matching is exact match so placing the URI in a triple pattern will work when you are matching the whole URI.
It is like ""^http://example/ns#xyz$" using ^$ to anchor the string.
The example is matching
"^http://example/ns#"
which is the namespace of the prefix of a URI. It matches anything beginning with that string like "http://example/ns#xyz" and "http://example/ns#abcd1234"
some:uri is a the whole URI in my example. There is no match when I use it in
SELECT ?d WHERE { ?d foo:uri some:uri }
or
SELECT ?d WHERE { ?d foo:uri <some:uri> }
or
SELECT ?d WHERE { ?d foo:uri 'some:uri' }
I have tried all combinations I can think of.
I only get a match when I do
SELECT ?d WHERE { ?d foo:uri ?uri
FILTER regex(str(?uri),"^some:uri$") }
'some:uri' is the actual string I have as URI for my rdf resource.
Btw. do is there a mailing list where this should move to?
/\/
The RDF I have looks like (hand-edited so might contain parse errors):
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.0="file://foo.owl#" >
<rdf:Description rdf:about="some:uri">
<rdf:type rdf:resource="file:///foo.owl#Document"/>
<j.0:firstNames rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Nikolaj</j.0:firstNames>
</rdf:Description>
<rdf:Description rdf:about="xx:meta1">
<j.0:document rdf:resource="some:uri"/>
<rdf:type rdf:resource="file://foo.owl#MetaDoc"/>
</rdf:Description>
</rdf:RDF>
This is better done by email on the jena-dev list. Direct email (and we can summarise here later) if you must. The HTML is messing up data because it looks like HTML tags.
Your data contains strange URIs:
<some:uri> but the query:
PREFIX : <file://foo.owl#>
SELECT *
{ ?x :document <some:uri> }
works.
Unrelated:
If you use RDF/XML_ABBREV as the writer, you get nicer looking RDF/XML.
Post a Comment