08 September 2007

Counting and GROUP BY

One thing people miss from SPARQL is counting. It's a feature that working group didn't have time for.

There's an implementation, following the design in SQL, in ARQ SVN which will be in the next release (v2.1). v2.1 introductions the cost-based optimizer for in-memory basic graph patterns by Markus.

It's a syntactic extension, not strict SPARQL, so you have to tell the system to parse queries in the "ARQ" langauge by passing Syntax.syntaxARQ to the query factory.

The following queries will work:

SELECT count(*) { ... }
SELECT (count(*) AS ?count) { ... }

This is based on having SELECT expressions as well as grouping. Using AS to give a named variable is better style because the results can go into the SPARQLXML results format; otherwise, an internal variable is allocated and they have illegal SPARQL names.

Other examples:

SELECT (count(*) AS ?rows)
{ ... }
GROUP BY ?x
SELECT count(distinct *)
{ ... }
GROUP BY ?x
SELECT count(?y)
{ ... }
GROUP BY ?x

What is being counted is solutions, in the case of count(*) and names, in the case of count(?var).

The current list of ARQ extensions is:

  • SERVICE - call-out from a query to a SPARQL endpoint over HTTP
  • SELECT expressions
  • GROUP BY
  • count()

So what features should be next?