SPARQL, like Gaul, is dividied into three parts:
- Basic graph pattern matching
- Graph expressions
- Solution modifiers
Basic graph patterns (BGPs) are a block of adjacent triple patterns that match, or don't match, all together. In BGP matching every named variable must have some value for the BGP to have been matched. Filters add restrictions on the values variables can take.
They are also an extension point in SPARQL because you can plug in a different BGP matcher (e.g. a DL reasoner or a wrapper for legacy SQL data), then reuse the other two layers to form complex queries out of the exact matches from BGP matching.
Graph expressions (OPTIONAL
, UNION
,
GRAPH
, Groups (things between {}) combine BGPs in various
ways to give more complicated patterns. Graph expressions are recursive
so you can have patterns within patterns. A SPARQL query has one
graph expression in the WHERE clause.
Solution modifiers (Project, DISTINCT
,
ORDER BY
, LIMIT/OFFSET
) take the output of
matching the query graph pattern and process it in various ways to
yield the result set.
Current Situation
The semantics for SPARQL have, so far, been declarative and top-down. A solution to a query passes if it meets all the conditions following from the definitions of each of the SPARQL graph pattern forms and filters. That means, in theory, the whole of the solution is available.
In fact, ARQ builds solutions up by adding the minimum necessary at each subpattern to make the solution match. This is what gives ARQ streaming evaluation which keeps the total memory use down. But the effect is the same, whole solutions are available everywhere in the pattern.
Another way is to have a bottom-up algebra. The relational algebra works this way. Subexpressions are evaluated to give tables (relations) and then these tables combined together to form new tables.
It turns out that these are nearly the same as shown by
Richard Cyganiak and
Jorge Perez et al
(see references below).
The class of graph patterns that differ are a certain kind of
nested
OPTIONAL
where a variable is used on the outside,
fixed part, and in the inner OPTIONAL
, but not in
between.
SELECT * { :x1 :p ?v . OPTIONAL { :x3 :q ?w . OPTIONAL { :x2 :p ?v } } }
Now that is a rather unusual query.
The tricky bit is the use of ?v
at the outermost and
innermost levels but not in between.The query can be rewritten so
as not to nest (note the repeat of :x3 :q ?w
).
SELECT * { :x1 :p ?v . OPTIONAL { :x3 :q ?w } OPTIONAL { :x3 :q ?w . :x2 :p ?v } }
A few references for work in this area:
- A relational algebra for SPARQL
- Semantics and Complexity of SPARQL
- Follow up to the paper above: Semantics of SPARQL
- The paper "Semantics Preserving SPARQL-to-SQL Query Translation for Optional Graph Patterns" includes doing the difficult case of nested optionals.
A SPARQL Algebra
In DAWG, we're considering an algebra for SPARQL, based on the paper Semantics of SPARQL. This will mean a change of results in the case above. I've never seen such a query in the wild, only in artificial test cases.
The change from declarative semantics to a constructional algebra is motivated primarily by the implementers in the working group wanting to apply all the good stuff on relational algebra optimization to SPARQL engines. The complexity of the exact mapping to SQL in the semantics preserving SPARQL-to-SQL is daunting.
Not that simple!
But other things differ as well if a purely relational style is applied (and that's not being proposed currently) because "Semantics of SPARQL" does not consider the difference in treatment of filters (it's focus is on the combination of graph patterns).
Two cases arise: one where filters are in OPTIONAL
s, and one
where filters are nested inside groups.
SELECT * { ?x :p ?v . OPTIONAL { ?y :q ?w . FILTER(?v = 2) } }
The FILTER
uses a variable from the fixed part of
the OPTIONAL
. Under Jorge's semantics the variable
isn't in-scope, so the filter evaluation is an error (unbound variable),
otherwise known as false; the optional part never matches.
This form of query is realistic and does occur for real
so the current proposal for DAWG makes this work. Like SQL
and LEFT OUTER JOIN
with the ON
clause,
the LeftJoin operation in the SPARQL algebra can take a condition
which applies over the whole LeftJoin.
The fixed pattern could be repeated inside the optional part but such repetition for something application might well use is bad, very bad.
Application writers will have to take care about scope and
groups: putting FILTER
s inside {}
changes
their scope.
Currently,
{ :x :p ?v . FILTER(?v < 5) }
and
{ :x :p ?v . { FILTER(?v < 5) } }
have the same effect. But they aren't in the algebraic form
because { FILTER(?v < 5) }
evaluates on the empty
table that does not have a ?v
in it; evaluation is an
error and hence false.
ARQ
There is an implementation of the algebra in
ARQ in CVS
(not in the current release ARQ 1.4). It's enabled in the command
line tool
arq.sparql
with --engine=ref
from the command line and
QueryEngine2.register() ;
in code.
The work-in-progress unfinished editor's working text is available. Check it is actually up to date because that's just work space and isn't a DAWG document. It does not reflect working group decisions - it more a proposal for consideration.
Disclaimer: Any views expressed here are mime and not the working group (which at the time of writing is thinking about it but hasn't decided anything yet).
1 comment:
Interesting that SPARQL semantics
considers the path of an algebra as
for TMQL (Topic Maps query language)
we are doing the same
The situation is for us much easier,
though, as TMQL is not entailment
neutral, has closed-world assumption,...
Post a Comment