25 March 2008

Two more ARQ extensions

I've implemented two new extensions for ARQ:

  • Assignment
  • Sub-queries

Both these expose facilities that are already in the query algebra.  Sub-queries are done by simply allowing query algebra operators to appear anywhere in the query, not requiring solution modifiers to only be at the outer level of the query, so it allows extensions like counting, to be inside the query and available to the rest of the pattern matching. An assigment operator existed as an algebra extension for optimization and to support ARQ SELECT expressions

Both are syntactic extensions and are available if the query is parsed with language Syntax.syntaxARQ.

Currently available in ARQ SVN.

Assignment

This assigns a computed value to a variable in the middle of a pattern.

LET (?x := ?y + 5 )

The assignment operator is ":=". A single "=" is already the test for equals in SPARQL.

This means that a computed value can be used in other pattern matching:

 SELECT ?y ?area
 {
    ?x rdf:type :Rectangle ;
       :height ?h ;
       :width ?w .
    LET (?area := ?h*?w )
    GRAPH <otherShapes>
    {
      ?y :area ?area . # Shapes with the same area
    }
 }

Application writer can provide their own functions, maybe to do a little data munging to map between different formats:

   ?x  foaf:name  ?name .          # "John Smith"
   # Convert to a different style: "Smith, John" for example.
   LET (?vcardName := my:convertName(?name) )
   ?y vCard:FN ?vcardName .

There are some rules for the assignment:

  • if the expression does not evaluate (e.g. unbound variable in the expression), no assignment occurs and the query continues.
  • if the variable is unbound, and the expression evaluates, the variable is bound to the value.
  • if the variable is bound to the same value as the expression evaluates, nothing happens and the query continues.
  • if the variable is bound to a different value as the expression evaluates, an error occurs and the current solution will be excluded from the results.

ARQ already has expressions in SELECT expressions so a combination of sub-query and expression can achieve the same effect but it's unnatural and verbose and sometimes requires parts of the pattern matching to be written twice, inside and outside the sub-query.

One place where LET might be useful is in a CONSTRUCT query. In strict SPARQL, only terms found in the original data can be used for variables in the construct template but with LET-assignment:

   CONSTRUCT { ?x :lengthInInches ?inch }
   WHERE
   { ?x :lengthInCM ?cm
     LET (?inch := ?cm/2.54 )
   }

This isn't a new idea - see for example: "A SPARQL Semantics based on Datalog" - although the syntax in ARQ is designed to group the terms better.

Sub-queries

A sub-query can be used to apply some solution modifier to a sub-pattern.  Useful examples include aggregation, especially grouping and counting, and LIMIT with ORDER BY to get only some of the results of a pattern match.

 { SELECT (COUNT(*) AS ?c) { ?s ?p ?o } }

A sub-query is enclosed by {} and must be the only thing inside those braces, the same style as Virtuoso Subqueries. The sub-query will be combined, with SPARQL join, with other patterns in the same group. In the example

Find how many people all persons with two or more phones foaf:knows:

 PREFIX foaf: <http://xmlns.com/foaf/0.1/>

 SELECT ?person ?knowsCount
 {
   # ?person who have 2 or more phones
   { SELECT ?person
     WHERE { ?person foaf:phone ?phone } 
     GROUP BY ?person 
     HAVING (COUNT(?phone) >= 2) 
   }
   # Join on ?person with how many people they foaf:knows
   { SELECT ?person (COUNT(?x) AS ?knowsCount)
     WHERE { ?person foaf:knows ?x .}
     GROUP BY ?person
   }
}

Queries with sub-queries can become complicated quite quickly so I usually write each of the part separately then combining them.