<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-18002060</id><updated>2011-12-02T23:07:04.215Z</updated><title type='text'>ARQtick</title><subtitle type='html'>A blog related to my work activities - SPARQL, RDF, ARQ, TDB, Jena, Fuseki.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>41</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-18002060.post-1214722059942542031</id><published>2011-05-19T15:16:00.002+01:00</published><updated>2011-05-19T15:16:30.316+01:00</updated><title type='text'>Importing SourceForge code into Apache for Jena</title><content type='html'>&lt;p&gt;
The Jena project now has the legal paperwork done for the vast majority
of the codebase.  It's now time to move the code from SourceForge, where's
it's been for almost 10 years (the project was registered November 2001).
&lt;/p&gt;
&lt;p&gt;
During that time, the SourceForge infrastructure has been excellent. We're
not moving because of dissatisfaction but because we want to put the
post-HP on a solid legal basis where the license and IP situation is
well-understood and completely clear.  We now have committers in 3 different
organisations, and contributions from yet more - it's slowly getting more
and more complicated.
&lt;/p&gt;
&lt;p&gt;
The way Apache works is that software is granted to Apache, which grants
Apache the right to re-license it.  Any software you use from Apache is a
license (with IP guarantees etc.) from Apache to you - not between you and
the original contributor, so you can when use the software commercially and
only need to check one Apache license.  
&lt;/p&gt;
&lt;p&gt;
Until now, we have had a setup where any contributions are simply
incorporated with the license and conditions of the contributor.  It so
happens that all the licensed code in the codebase is the same BSD-type license
but in using Jena you don't get a single license, you get one from every
contributor.  For some people who are going to depend on Jena for
commercial use or long term big deployment, this matters.  We've had a user
crawl the codebase to check each of the licenses (as they should but it's
just work).  With Apache it's different - one license, well-understood legal
situation.
&lt;/p&gt;
&lt;p&gt;
Contributors grant software two ways - either a software grant document
or when they upload code to a mailing list or to Jira.  When you add
something to Jira there's a tick box to say you are making the grant to
Apache, otherwise while it may illustrate some issue, we can't use it in
the codebase.
&lt;/p&gt;
&lt;p&gt;
Apache use &lt;a href="http://subversion.apache.org/"&gt;subversion&lt;/a&gt; so Jena
needs to import the code base to svn.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;Subversion or git or Mercurial ...&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
Aside: one question I've been asked is why not a DVCS like git or mercurial. Aapche use Subversion.
As I understand it, there are legal matters to consider.  Suppose A pushes code to
B and B pushes to Apache.  A has not necessarily granted the software to
Apache - B could check but it's a new burden for B, and pushing to Apache
is B's responsibility but B does not own A's contribution.  Maybe this will
change sometime but at the moment, DVCS works for direct contributor
to user licensing (and the user "should" then check every license) but not
the consolidation offered by Apache.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;Process&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
Jena has three repositories, Jena in CVS, Jena in SVN and Joseki in CVS.
There are active projects in all of them but theer is also a lot of history
and legacy.  We want to import everything as a record of ownerships, not
just copy the latest working copy.
&lt;/p&gt;
&lt;p&gt;
This is the process I have put together:
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;1. Grab the repositories&lt;/b&gt;  
&lt;/p&gt;
&lt;p&gt;
SourceForge offer &lt;a href="https://sourceforge.net/apps/trac/sourceforge/wiki/Using%20rsync%20for%20backups"&gt;rsync access for backup&lt;/a&gt;,
with history (the tarballs are just the current state).
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;2. Convert CVS to SVN&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
We have a multi-project layout so cvs2svn needs some arguments.
&lt;/p&gt;
&lt;pre&gt;
#!/bin/bash
MODS="ARQ BRQL DataGenerator Eyeball EyeballAcceptance Scratch extras grddl gvs iri jena jena-perf jena2 modeljeb owlsyntax rdf-html sparql2sql
tutorial"

SVN=ASF-Jena-CVS   # Destination
CVS=../Jena-CVS    # Local rsync backup

for in $MODS
do
    echo "==== $m"
    #ARGS="--dry-run"
    ARGS="$ARGS --encoding=utf8 --encoding=iso-8859-1"
    # Create trunk/branshes/tag structure per project
    ARGS="$ARGS --trunk=$m/trunk --branches=$m/branches --tags=$m/tags"
    cvs2svn $ARGS --existing-svnrepos --svnrepos "$SVN" $CVS/$m
done
&lt;/pre&gt;
&lt;p&gt;
and much the same for Joseki except the modules list is just "Joseki1 Joseki3
Joseki3" and it is much faster.
&lt;/p&gt;
&lt;p&gt;
Dry-run this first : it showed up two problems.
&lt;/p&gt;
&lt;p&gt;
The "--encoding=utf8 --encoding=iso-8859-1" to to get the translation of
some people's names right (non-ASCII characters).
&lt;/p&gt;
&lt;p&gt;
A name clash in Joseki couldn't be resolved.  Fortunately, it was with some old
intermediate binaries so simply deleting from CVS (the joy of CVS using the filesystem
layout) was simplest.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;3. Dump the repositories&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
Use "svnadmin dump" and gzip the files.  They are
going to uploaded to an Apache machine and they are quite large - 3.1G to
upload over from my home cable connection (1.5Mbit up).
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;4. Import to subversion&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
This step has been done by the Apache Infrastructure team as it requires svnadmin access to the respository.
See &lt;a href="https://issues.apache.org/jira/browse/INFRA-3628"&gt;INFRA-3628&lt;/a&gt; for the details.
&lt;/p&gt;
&lt;p&gt;
It's good to check it's going to do the right thing first.
We now have the files for three repositories.  We want the imported svn to look like:
&lt;/p&gt;
&lt;pre&gt;
   .../Import/Jena-CVS/...
   .../Import/Jena-SVN/...
   .../Import/Joseki-CVS/...
&lt;/pre&gt;
&lt;p&gt;
so we have a permanent record of the code state at the start of the Aapche
svn.  After import, active project can be "svn copy"ed out to give the working
versions going forward.
&lt;/p&gt;
&lt;p&gt;
To test it's going to work when the apache infrastrucure team so the actual
import, I built a local repo in the same layout.
&lt;/p&gt;
&lt;pre&gt;
# ---- Create the layout in Apache repository
mkdir -p Layout/incubator/jena/Import/Joseki-CVS
mkdir -p Layout/incubator/jena/Import/Jena-CVS
mkdir -p Layout/incubator/jena/Import/Jena-SVN
svnadmin create ApacheRepo
svn import Layout/ file://$PWD/ApacheRepo -m "Set layout"
rm -rf Layout

then it's juts a matter of inserting the code in the right place:

# --- Imports
REPO=ApacheRepo

# Joseki-CVS
gzip -d &amp;lt; Imports/ASF-Joseki-CVS.svn.gz | \
     svnadmin load --parent-dir incubator/jena/Import/Joseki-CVS $REPO

# Jena-CVS
gzip -d &amp;lt; Imports/ASF-Jena-CVS.svn.gz | \
     svnadmin load --parent-dir incubator/jena/Import/Jena-CVS $REPO

# Jena-SVN
gzip -d &amp;lt; Imports/ASF-Jena-SVN.svn.gz | \
     svnadmin load --parent-dir incubator/jena/Import/Jena-SVN $REPO
&lt;/pre&gt;
&lt;p&gt;
The slow bits where csv2svn (it's not bad but it's not instant : an hour or
so), the upload to Apache (a couple of hours) and the checking the
"svnadmin load" (another couple of hours).
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;5. Extract working copies&lt;/b&gt;
&lt;p&gt;
We're keeping the imports unchanged as a record of the starting point at Apache (revision 1124118)


&lt;p&gt;
The whole process has been done now - 

&lt;a href="https://svn.apache.org/repos/asf/incubator/jena/"&gt;Jena code at Apache&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-1214722059942542031?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/1214722059942542031/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=1214722059942542031' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/1214722059942542031'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/1214722059942542031'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2011/05/importing-sourceforge-code-into-apache.html' title='Importing SourceForge code into Apache for Jena'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-402986767525325932</id><published>2011-03-02T19:41:00.001Z</published><updated>2011-03-02T20:01:04.183Z</updated><title type='text'>Updating RDF Lists with SPARQL</title><content type='html'>&lt;p&gt;
Something the &lt;a href="http://www.w3.org/2009/sparql/wiki/Main_Page"&gt;SPARQL Working Group&lt;/a&gt; has been thinking about recently is updates to RDF lists.
&lt;/p&gt;
&lt;p&gt;
RDF lists are hard to deal with because they are not &lt;a href="http://en.wikipedia.org/wiki/First-class_object"&gt;first class objects&lt;/a&gt; in the RDF data model.  Instead they are "encoded" in triples.  The encoding using a &lt;a href="http://en.wikipedia.org/wiki/Cons"&gt;cons cell&lt;/a&gt; like structure whereby each element of the list is a blank node (not necessary a blank node but it nearly always is).
&lt;/p&gt;
&lt;p&gt;
RDF lists are correctly called "&lt;a href="http://www.w3.org/TR/rdf-primer/#collections"&gt;RDF collections&lt;/a&gt;" but as it's the list-nature (elements in order) that matters, I'll call them lists in this blog.
&lt;/p&gt;
&lt;p&gt;
Turtle and SPARQL has syntax for lists, but it's only surface syntax, and there are really triples in the RDF graph:
&lt;/p&gt;
&lt;pre&gt;
@prefix : &lt;http://example/&gt; .
:x :p (1 2 3) .
&lt;/pre&gt;
&lt;p&gt;is the RDF:&lt;/p&gt;
&lt;pre&gt;
:x    :p         _:b0 .
_:b0  rdf:first  1 .
_:b0  rdf:rest   _:b1 .
_:b1  rdf:first  2 .
_:b1  rdf:rest   _:b2 .
_:b2  rdf:first  3 .
_:b2  rdf:rest   rdf:nil
&lt;/pre&gt;

&lt;p&gt;
RDF toolkits help by presenting lists as progamming language lists.  This also helps in keeping the lists well formed.  In all those triples, there is one &lt;tt&gt;rdf:rest&lt;/tt&gt; and one &lt;tt&gt;rdf:first&lt;/tt&gt; per list element - but it's legal RDF to have several uses of the properties, or none, on one subject.
&lt;/p&gt;
&lt;p&gt;
As an addition quirk, the empty list isn't any RDF triples, so looking for lists isn't just looking for &lt;tt&gt;rdf:rest&lt;/tt&gt; properties.
&lt;/p&gt;
&lt;p&gt;
&lt;pre&gt;
@prefix : &lt;http://example/&gt; .
:x :p () .
&lt;/pre&gt;
&lt;p&gt;is the RDF:&lt;/p&gt;
&lt;pre&gt;
:x :p rdf:nil .
&lt;/pre&gt;

&lt;h3&gt;Lists, Property Paths and Update&lt;/h3&gt;
&lt;p&gt;
&lt;a href="http://www.w3.org/TR/sparql11-query/"&gt;SPARQL 1.1 Query&lt;/a&gt; adds &lt;a href="http://www.w3.org/TR/sparql11-query/#propertypaths"&gt;property paths&lt;/a&gt;, which make working with lists a bit easier, but it's not perfect.  List elements do not necessarily come out in order.

&lt;pre&gt;
{ :list rdf:rest*/rdf:next ?element }
&lt;/pre&gt;

&lt;p&gt;
But what about &lt;a href="http://www.w3.org/TR/sparql11-update/"&gt;SPARQL 1.1 Update&lt;/a&gt;?  How can we work with RDF lists?  Here are some scripts for list operations.  By using property paths they work on arbitrary length lists.
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="#add-first"&gt;Add an element to the start of a list&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#add-last"&gt;Add an element to the end of a list&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#del-first"&gt;Delete the element at the start of a list&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#del-last"&gt;Delete an element to the end of a list&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#del-all-1"&gt;Delete the whole list&lt;/a&gt; (version 1 - common case)&lt;/li&gt;
&lt;li&gt;&lt;a href="#del-all-2"&gt;Delete the whole list&lt;/a&gt; (version 2 - more general case)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All the scripts are self-contained - they include tests data.&lt;/p&gt;

&lt;p&gt;
They are examples - they aren't necessarily fully general, for example, if lists are badly formed or the property &lt;tt&gt;:p&lt;/tt&gt; is also used to relate the subject to things that aren't lists.  The last example shows a way to address that by finding and marking relavent points in the graph, doing some work and going back and tidying up.  The graph updated is also being used as a scratch pad.
&lt;/p&gt;

&lt;h3 id="add-first"&gt;Add an element to the start of a list&lt;/h3&gt;

&lt;pre&gt;
PREFIX :    &amp;lt;http://example/&gt; 
PREFIX rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; 

INSERT DATA {
  :x0 :p () .
  :x1 :p (1) .
  :x2 :p (1 2) .
  :x3 :p (1 2 3) .
} ;

DELETE { ?x :p ?list }
INSERT { ?x :p [ rdf:first 0 ; 
                 rdf:rest ?list ]
       }
WHERE
{
  ?x :p ?list .
}
&lt;/pre&gt;

&lt;p&gt;This one is relatively easy. Find the list start &lt;tt&gt;?x :p ?list&lt;/tt&gt;, which works
whether the list is zero length or already has elements, 
delete the old triple that connected to the start of the list,
put in a new cons cell (the &lt;tt&gt;[...]&lt;/tt&gt;) at the start, and link to it.
&lt;/p&gt;

&lt;h3 id="add-last"&gt;Add an element to the end of a list&lt;/h3&gt;

&lt;pre&gt;
PREFIX :    &amp;lt;http://example/&gt; 
PREFIX rdf: &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; 

INSERT DATA {
  :x0 :p () .
  :x1 :p (1) .
  :x2 :p (1 2) .
  :x3 :p (1 2 3) .
} ;

# The order here is important.
# Must do list &gt;= 1 first.

# List of length &gt;= 1
DELETE { ?elt rdf:rest rdf:nil }
INSERT { ?elt rdf:rest [ rdf:first 98 ; rdf:rest rdf:nil ] }
WHERE
{
  ?x :p ?list .
  # List of length &gt;= 1
  ?list rdf:rest+ ?elt .
  ?elt rdf:rest rdf:nil .
  # ?elt is last cons cell
} ;

# List of length = 0
DELETE { ?x :p rdf:nil . }
INSERT { ?x :p [ rdf:first 99 ; rdf:rest rdf:nil ] }
WHERE
{
   ?x :p rdf:nil .
}
&lt;/pre&gt;

&lt;p&gt;
This is a bit harder - there are two cases, lists of length 0 and lists of length one or more.
The element before the insertion point needs changing and that can be a cons cell (list length &gt;= 1)
or the empty list (the triple pointing to it).
&lt;/p&gt;
&lt;p&gt;
Do the lists of length one or more first, otherwise the adding to a list of length zero will be caught again by the adding to a list of length one.
&lt;/p&gt;
&lt;p&gt;
For a list of length 1 or more: find the last element.  The &lt;tt&gt;WHERE&lt;/tt&gt; finds &lt;tt&gt;?elt&lt;/tt&gt; by finding all elements of the list &lt;tt&gt;rdf:rest+&lt;/tt&gt;, and checking it's the last element by looking for
&lt;tt&gt;?elt rdf:rest rdf:nil&lt;/tt&gt;.
&lt;/p&gt;
&lt;p&gt;
Then delete the &lt;tt&gt;rdf:rest&lt;/tt&gt;, and insert the new cons cell &lt;tt&gt;[ rdf:first 98 ; rdf:rest rdf:nil ]&lt;/tt&gt;.
&lt;/p&gt;
&lt;p&gt;
For a list of length 0, the style is the same but the finding the triple to delete-insert to attch the cons cell is different.
&lt;/p&gt;

&lt;h3 id="del-first"&gt;Delete the element at the start of a list&lt;/h3&gt;

&lt;pre&gt;
PREFIX :      &amp;lt;http://example/&gt; 
PREFIX rdf:   &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; 

INSERT DATA {
  :x3 :p (1 2 3) .
  :x2 :p (1 2) .
  :x1 :p (1) .
  :x0 :p () .
} ;

DELETE { 
   ?x :p ?list .
   ?list rdf:first ?first ;
         rdf:rest  ?rest }
INSERT { ?x :p ?rest }
WHERE
{
  ?x :p ?list .
  ?list rdf:first ?first ;
        rdf:rest ?rest .
}
&lt;/pre&gt;

&lt;p&gt;
This can be done in one step - we are not interested in lists of length 0 because they have no element to delete.  So find the pattern at the start of the list, delete it (note the &lt;tt&gt;WHERE&lt;/tt&gt; pattern and &lt;tt&gt;DELETE&lt;/tt&gt; template are the same), and insert the new triple that links the list directly
to the previous &lt;tt&gt;rdf:rest&lt;/tt&gt;.
&lt;/p&gt;

&lt;h3 id="del-last"&gt;Delete the element at the end of a list&lt;/h3&gt;

&lt;pre&gt;
PREFIX :     &amp;lt;http://example/&gt; 
PREFIX rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; 

INSERT DATA {
  :x3 :p (1 2 3) .
  :x2 :p (1 2) .
  :x1 :p (1) .
  :x0 :p () .
} ;

# List of length 1
# Do before other lists.

DELETE { ?x :p ?elt .
         ?elt  rdf:first ?v .
         ?elt  rdf:rest  rdf:nil .
       }
INSERT { ?x :p rdf:nil . }
WHERE
{
  ?x :p ?elt .
  ?elt rdf:first ?v ;
       rdf:rest rdf:nil .
} ;

# List of length &gt;= 2
DELETE { ?elt1 rdf:rest ?elt .
         ?elt  rdf:first ?v .
         ?elt  rdf:rest  rdf:nil .
       }
INSERT { ?elt1 rdf:rest rdf:nil }
WHERE
{
  ?x :p ?list .
  ?list rdf:rest* ?elt1 .

  # Second to end.
  ?elt1 rdf:rest ?elt .
  # End.
  ?elt rdf:first ?v ;
       rdf:rest rdf:nil .
}&lt;/pre&gt;

&lt;p&gt;
The cases to consider are lists of exactly one and lists of two or more elements.  It's the treatment of the element before the element we're deleteing that is different.
&lt;/p&gt;
&lt;p&gt;
The style is the same though - find the place before the deleting, and the delete that cons cell.
&lt;/p&gt;
&lt;p&gt;
For the list of length 2 or more, &lt;tt&gt;rdf:rest*&lt;/tt&gt; is used which, is all elements including the &lt;tt&gt;?list&lt;/tt&gt; case of zero steps - then the structure beyond that is tested for being the end.  There are 2 &lt;tt&gt;rdf:rest&lt;/tt&gt; uses in the test for the end, hence list of length 2 or more.
&lt;/p&gt;

&lt;h3 id="del-all-1"&gt;Delete the whole list (common case)&lt;/h3&gt;

&lt;pre&gt;
PREFIX rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; 
PREFIX :     &amp;lt;http://example/&gt; 

INSERT DATA {
:x0 :p () .
:x0 :q "abc" .

:x1 :p (1) .
:x1 :q "def" .

:x2 :p (1 2) .
:x2 :q "ghi" .
} ;

# Delete the cons cells.
DELETE
    { ?z rdf:first ?head ; rdf:rest ?tail . }
WHERE { 
      [] :p ?list .
      ?list rdf:rest* ?z .
      ?z rdf:first ?head ;
         rdf:rest ?tail .
      } ;

# Delete the triples that connect the lists.
DELETE WHERE { ?x :p ?z . }

&lt;/pre&gt;

&lt;p&gt;
This version is not fully general because it assume that &lt;tt&gt;:p&lt;/tt&gt; is a link to the list and not also to any other RDF terms (non-lists) which we would want to keep.
&lt;/p&gt;
&lt;p&gt;
The first &lt;tt&gt;DELETE&lt;/tt&gt; finds and removes all cons cells.  The second &lt;tt&gt;DELETE&lt;/tt&gt; removes the triple with &lt;tt&gt;:p&lt;/tt&gt; connecting the list to the subject.
&lt;/p&gt;

&lt;h3 id="del-all-2"&gt;Delete the whole list (general case)&lt;/h3&gt;

&lt;pre&gt;
PREFIX rdf:  &amp;lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt; 
PREFIX :     &amp;lt;http://example/&gt; 

INSERT DATA {
:x0 :p () .
:x0 :p "String 0" .
:x0 :p [] .

:x1 :p (1) .
:x1 :p "String 1" .
:x1 :p [] .

:x2 :p (1 2) .
:x2 :p "String 2" .
:x2 :p [] .

# A list not connected.
(1 2) .

# Not legal RDF.
# () .

} ;

INSERT { ?list :deleteMe true . }
WHERE {
   ?x :p ?list . 
   FILTER (?list = rdf:nil || EXISTS{?list rdf:rest ?z} )
} ;

# Delete the cons cells.
DELETE
    { ?z rdf:first ?head ; rdf:rest ?tail . }
WHERE { 
      [] :p ?list .
      ?list rdf:rest* ?z .
      ?z rdf:first ?head ;
         rdf:rest ?tail .
      } ;

# Delete the marked nodes
DELETE 
WHERE { ?x :p ?z . 
        ?z :deleteMe true . 
} ;

## ------
## Unconnected lists.

DELETE
    { ?z rdf:first ?head ; rdf:rest ?tail . }
WHERE { 
      ?list rdf:rest ?z2 .
      FILTER NOT EXISTS { ?s ?p ?list }
      ?list rdf:rest* ?z .
      ?z rdf:first ?head ;
         rdf:rest ?tail .
      } 

&lt;/pre&gt;

&lt;p&gt;
Deep breath.
&lt;/p&gt;
&lt;p&gt;
This one is quite long.
&lt;/p&gt;
&lt;p&gt;
The first step is to find and mark all the triples from a subject to a list via &lt;tt&gt;:p&lt;/tt&gt;.
We will need to delete at the end of the process but the property might also be used for non-lists
and after the middle &lt;tt&gt;DELETE&lt;/tt&gt; step all evidence of the lists is lost.
The test:
&lt;/p&gt;
&lt;p&gt;
&lt;pre&gt;    FILTER (?list = rdf:nil || EXISTS{?list rdf:rest ?z} )&lt;/pre&gt;
&lt;p&gt;
catches both zero length lists and lists with elements.
&lt;/p&gt;
&lt;p&gt;
Second step: delete all list elements, any subjects with properties &lt;tt&gt;rdf:first&lt;/tt&gt; and &lt;tt&gt;rdf:rest&lt;/tt&gt;.
&lt;/p&gt;
&lt;p&gt;
Third step: remove the connecting triples and the markers.
&lt;/p&gt;
&lt;p&gt;
Finally, we delete any lists where the start isn't connected to anything, which is the
&lt;/p&gt;
&lt;pre&gt;    FILTER NOT EXISTS { ?s ?p ?list }&lt;/pre&gt;
&lt;p&gt;
test.
&lt;/p&gt;

&lt;h3&gt;License and Copyright&lt;/h3&gt;

&lt;p&gt;This page and the SPARQL 1.1 Update scripts are (c) &lt;a href=""&gt;Epimorphics Ltd&lt;/a&gt;
and licensed under a 
&lt;a href="http://creativecommons.org/licenses/by/3.0"&gt;Creative Commons Attribution 3.0 License&lt;/a&gt;.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-402986767525325932?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/402986767525325932/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=402986767525325932' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/402986767525325932'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/402986767525325932'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2011/03/updating-rdf-lists-with-sparql.html' title='Updating RDF Lists with SPARQL'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-5621962958219212868</id><published>2010-12-09T13:41:00.001Z</published><updated>2010-12-09T13:41:51.318Z</updated><title type='text'>Performance benchmarks for the TDB loader (version 2)</title><content type='html'>&lt;p&gt;CAVEAT&lt;/p&gt;

&lt;p&gt;There are "&lt;a href="http://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics"&gt;Lies, damned lies, and statistics&lt;/a&gt;" but worse are probably performance measurements done by someone else.
The real test is what does it mean for any given application and is performance "fit for purpose". Database-related performance measurements are particular murky.  The shape of the data matters, the usage made of the data matters, all in ways that can wildly affect whether a system is for for purpose.&lt;/p&gt;

&lt;p&gt;Treat these figures with care - they are given to compare the TDB bulker (to version 0.8.7) loader and the new one (version 0.8.8 and later).  Even then, the new bulk loader is new, so it is subject to tweaking and tuning but hopefully just to improve performance, not worsen it.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="#bsbm"&gt;BSBM&lt;/a&gt; &amp;ndash; Berlin SPARQL Benchmark&lt;/li&gt;
  &lt;li&gt;&lt;a href="#coins"&gt;COINS&lt;/a&gt; &amp;ndash; &lt;a href="http://data.gov.uk/dataset/coins"&gt;Combined Online Information System&lt;/a&gt; from the UK Treasury.&lt;/li&gt;
  &lt;li&gt;&lt;a href="#lubm"&gt;LUBM&lt;/a&gt;  &amp;ndash; Lehigh University Benchmark.&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;&lt;b&gt;See also&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://esw.w3.org/RdfStoreBenchmarking"&gt;http://esw.w3.org/RdfStoreBenchmarking&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;font size="+2"&gt;&lt;b&gt;&lt;a name="summary"&gt;Summary&lt;/a&gt;&lt;/b&gt;&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;The new bulk loader is faster by x2 or more depending on the characteristics of the data.  
As loads can take hours, this saving is very useful.
It produces smaller databases and the databases are as good as or better in terms of performance
than the ones produced by the current bulk loader.&lt;/p&gt;

&lt;p&gt;&lt;font size="+2"&gt;&lt;b&gt;&lt;a name="setup"&gt;Setup&lt;/a&gt;&lt;/b&gt;&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;The tests were run on a small local server, not tuned or provisioned for database work, just a machine that happened to be easily accessible.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;8GB RAM&lt;/li&gt;
&lt;li&gt;4 core Intel i5 760 @2.8Ghz
&lt;li&gt;Ubuntu 10.10 - ext4 filing system&lt;/li&gt;
&lt;li&gt;Disk: WD 2 TB - SATA-300 7200 rpm and buffer Size 64 MB&lt;/li&gt;
&lt;li&gt;Java version Sun/Oracle JDK 1.6.0_22 64-Bit Server VM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;font size="+2"&gt;&lt;b&gt;&lt;a name="bsbm"&gt;BSBM&lt;/a&gt;&lt;/b&gt;&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/V5/index.html"&gt;BSBM published results from Nov 2009&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The figures here are produced using a modified version of the &lt;a href="https://sourceforge.net/projects/bsbmtools/"&gt;BSBM tools set&lt;/a&gt; used for &lt;a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/V5/"&gt;version 2 of BSBM&lt;/a&gt;. The modifications are to run the tests on a local database, not over HTTP. The code is &lt;a href="https://github.com/afs/BSBM-Local"&gt;available from github&lt;/a&gt;. See also &lt;a href="http://seaborne.blogspot.com/2009/10/bsbm-jena.html"&gt;this article&lt;/a&gt;.
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="#bsbm-loader"&gt;Loader Performance for BSBM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#bsbm-sizes"&gt;Database Sizes for BSBM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#bsbm-query"&gt;Query Performance for BSBM&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;font size="+1"&gt;&lt;b&gt;&lt;a name="bsbm-loader"&gt;BSBM - Loader performance&lt;/a&gt;&lt;/b&gt;&lt;/font&gt;&lt;/p&gt;

&lt;table style="border-collapse: collapse; border-color: #000000" border="1" cellpadding="5"&gt;
  &lt;tr&gt;
    &lt;th align="right"&gt;BSBM dataset&lt;/th&gt;
    &lt;th align="right"&gt;Triples&lt;/th&gt;
    &lt;th align="right"&gt;Loader 1&lt;/th&gt;
    &lt;th align="right"&gt;Rate&lt;/th&gt;
    &lt;th align="right"&gt;Loader 2&lt;/th&gt;
    &lt;th align="right"&gt;Rate&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align="right"&gt;50k&lt;/td&gt;
    &lt;td align="right"&gt;50,057&lt;/td&gt;
    &lt;td align="right"&gt;3s&lt;/td&gt;
    &lt;td align="right"&gt;18,011 TPS&lt;/td&gt;
    &lt;td align="right"&gt;7s&lt;/td&gt;
    &lt;td align="right"&gt;7,151 TPS&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align="right"&gt;250k&lt;/td&gt;
    &lt;td align="right"&gt;250,030&lt;/td&gt;
    &lt;td align="right"&gt;8s&lt;/td&gt;
    &lt;td align="right"&gt;31,702 TPS&lt;/td&gt;
    &lt;td align="right"&gt;11s&lt;/td&gt;
    &lt;td align="right"&gt;22,730 TPS&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align="right"&gt;1m&lt;/td&gt;
    &lt;td align="right"&gt;1,000,313&lt;/td&gt;
    &lt;td align="right"&gt;26s&lt;/td&gt;
    &lt;td align="right"&gt;38,956 TPS&lt;/td&gt;
    &lt;td align="right"&gt;27s&lt;/td&gt;
    &lt;td align="right"&gt;37,049 TPS&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align="right"&gt;5m&lt;/td&gt;
    &lt;td align="right"&gt;5,000,339&lt;/td&gt;
    &lt;td align="right"&gt;121s&lt;/td&gt;
    &lt;td align="right"&gt;41,298 TPS&lt;/td&gt;
    &lt;td align="right"&gt;112s&lt;/td&gt;
    &lt;td align="right"&gt;44,646 TPS&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align="right"&gt;25m&lt;/td&gt;
    &lt;td align="right"&gt;25,000,250&lt;/td&gt;
    &lt;td align="right"&gt;666s&lt;/td&gt;
    &lt;td align="right"&gt;37,561 TPS&lt;/td&gt;
    &lt;td align="right"&gt;586s&lt;/td&gt;
    &lt;td align="right"&gt;42,663 TPS&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align="right"&gt;100m&lt;/td&gt;
    &lt;td align="right"&gt;100,000,112&lt;/td&gt;
    &lt;td align="right"&gt;8,584s&lt;/td&gt;
    &lt;td align="right"&gt;11,650 TPS&lt;/td&gt;
    &lt;td align="right"&gt;3,141s&lt;/td&gt;
    &lt;td align="right"&gt;31,837 TPS&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align="right"&gt;200m&lt;/td&gt;
    &lt;td align="right"&gt;200,031,413&lt;/td&gt;
    &lt;td align="right"&gt;30,348s&lt;/td&gt;
    &lt;td align="right"&gt;6,591 TPS&lt;/td&gt;
    &lt;td align="right"&gt;8,309s&lt;/td&gt;
    &lt;td align="right"&gt;24,074 TPS&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align="right"&gt;350m&lt;/td&gt;
    &lt;td align="right"&gt;350,550,000&lt;/td&gt;
    &lt;td align="right"&gt;83,232s&lt;/td&gt;
    &lt;td align="right"&gt;4,212 TPS&lt;/td&gt;
    &lt;td align="right"&gt;21,146s&lt;/td&gt;
    &lt;td align="right"&gt;16,578 TPS&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;


&lt;p&gt;&lt;font size="+1"&gt;&lt;b&gt;&lt;a name="bsbm-sizes"&gt;BSBM - Database sizes&lt;/a&gt;&lt;/b&gt;&lt;/font&gt;&lt;/p&gt;

&lt;table style="border-collapse: collapse; border-color: #000000" border="1" cellpadding="5"&gt;
&lt;tr&gt;
  &lt;th&gt;Database&lt;/th&gt;
  &lt;th&gt;Size/loader1&lt;/th&gt;
  &lt;th&gt;Size/loader2&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;50k&lt;/td&gt;
  &lt;td&gt;10MB&lt;/th&gt;
  &lt;td&gt;7.2MB&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;250k&lt;/td&gt;
  &lt;td&gt;49MB&lt;/td&gt;
  &lt;td&gt;35MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;1m&lt;/td&gt;
  &lt;td&gt;198MB&lt;/td&gt;
  &lt;td&gt;137MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;5m&lt;/td&gt;
  &lt;td&gt;996MB&lt;/td&gt;
  &lt;td&gt;680MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;25m&lt;/td&gt;
  &lt;td&gt;4.9GB&lt;/td&gt;
  &lt;td&gt;3.3GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;100m&lt;/td&gt;
  &lt;td&gt;20GB&lt;/td&gt;
  &lt;td&gt;13GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;200m&lt;/td&gt;
  &lt;td&gt;39GB&lt;/td&gt;
  &lt;td&gt;26GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;350m&lt;/td&gt;
  &lt;td&gt;67GB&lt;/td&gt;
  &lt;td&gt;45GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;&lt;font size="+1"&gt;&lt;b&gt;&lt;a name="bsbm-query"&gt;BSBM - Query Performance&lt;/a&gt;&lt;/b&gt;&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;Numbers are "query mix per hour"; larger numbers are better.  The BSBM performance engine was run with 100 warmups and 100 timing runs over local databases.&lt;/p&gt;

&lt;table 
style="border-collapse: collapse; border-color: #000000" border="1" cellpadding="5"&gt;
&lt;tr&gt;
&lt;th&gt;Loader used&lt;/th&gt;&lt;th&gt;50k&lt;/th&gt;&lt;th&gt;250k&lt;/th&gt;&lt;th&gt;1m&lt;/th&gt;&lt;th&gt;5m&lt;/th&gt;&lt;th&gt;25m&lt;/th&gt;&lt;th&gt;100m&lt;/th&gt;&lt;th&gt;200m&lt;/th&gt;&lt;th&gt;350m&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td width="29%"&gt;&lt;strong&gt;Loader 1&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;102389.1&lt;/td&gt;&lt;td&gt;87527.4&lt;/td&gt;&lt;td&gt;58441.6&lt;/td&gt;&lt;td&gt;5854.7&lt;/td&gt;&lt;td&gt;1798.4&lt;/td&gt;&lt;td&gt;673.0&lt;/td&gt;&lt;td&gt;410.7&lt;/td&gt;&lt;td&gt;250.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td width="29%"&gt;&lt;strong&gt;Loader 2&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;106920.1&lt;/td&gt;&lt;td&gt;86726.1&lt;/td&gt;&lt;td&gt;62240.7&lt;/td&gt;&lt;td&gt;11384.5&lt;/td&gt;&lt;td&gt;3477.9&lt;/td&gt;&lt;td&gt;797.1&lt;/td&gt;&lt;td&gt;425.8&lt;/td&gt;&lt;td&gt;259.2&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;
What this does show is that for a narrow range of database sizes around 5m to 25m,
the databases produced by loader2 are faster.
This happens because the majority ogf the working set of databases due to loader1 didn't fit mostly in-memory but those produced by loader2 do.&lt;/p&gt;


&lt;p&gt;&lt;font size="+2"&gt;&lt;b&gt;&lt;a name="coins"&gt;COINS&lt;/a&gt;&lt;/b&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p&gt;COINS is the &lt;a href="http://data.gov.uk/dataset/coins"&gt;Combined Online Information System&lt;/a&gt; from the UK Treasury. 
It's a real-wolrd database that has been converted to RDF by my colleague, Ian - see &lt;a href="http://data.gov.uk/resources/coins"&gt;Description of the conversion to RDF&lt;/a&gt; done by Ian for &lt;a href="http://data.gov.uk/"&gt;data.gov.uk&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.hm-treasury.gov.uk/psr_coins_data.htm"&gt;General information about COINS&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;COINS is all named graphs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="#coins-loader"&gt;Loader Performance for COINS&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#coins-sizes"&gt;Database Sizes for COINS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;font size="+1"&gt;&lt;b&gt;&lt;a name="coins-loader"&gt;COINS - Loader Performance&lt;/a&gt;&lt;/b&gt;&lt;/font&gt;&lt;/p&gt;

&lt;table style="border-collapse: collapse; border-color: #000000" border="1" cellpadding="5"&gt;
  &lt;tr&gt;
    &lt;th align="right"&gt;COINS dataset&lt;/th&gt;
    &lt;th align="right"&gt;Quads&lt;/th&gt;
    &lt;th align="right"&gt;Loader 1&lt;/th&gt;
    &lt;th align="right"&gt;Rate&lt;/th&gt;
    &lt;th align="right"&gt;Loader 2&lt;/th&gt;
    &lt;th align="right"&gt;Rate&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align="right"&gt;&lt;/td&gt;
    &lt;td align="right"&gt;417,792,897&lt;/td&gt;
    &lt;td align="right"&gt;26,425s&lt;/td&gt;
    &lt;td align="right"&gt;15,811 TPS&lt;/td&gt;
    &lt;td align="right"&gt;17,057s&lt;/td&gt;
    &lt;td align="right"&gt;24,494 TPS&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;&lt;b&gt;&lt;a name="coins-sizes"&gt;COINS - Database sizes&lt;/a&gt;&lt;/b&gt;&lt;/p&gt;

&lt;table style="border-collapse: collapse; border-color: #000000" border="1" cellpadding="5"&gt;
&lt;tr&gt;
  &lt;th&gt;Size/loader1&lt;/th&gt;
  &lt;th&gt;Size/loader2&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;152GB&lt;/th&gt;
  &lt;td&gt;77GB&lt;/th&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;&lt;font size="+2"&gt;&lt;b&gt;&lt;a name="lubm"&gt;LUBM&lt;/a&gt;&lt;/b&gt;&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;
&lt;a href="http://swat.cse.lehigh.edu/projects/lubm/"&gt;LUBM information&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;LUBM isn't a very representative benchmark for RDF and linked data applications - it is design more
for testing inference.  But there is some details of various systems published using this benchmark.
To check the new loader on this data, I ran loads for a couple of the larger generated.  These are
the 1000 and 5000 datasets, with inference applied during data creation. The 5000 dataset, just under
a billion triples, was only run through the new loader.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="#lubm-loader"&gt;Loader Performance for LUBM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#lubm-sizes"&gt;Database Sizes for LUBM&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;font size="+1"&gt;&lt;b&gt;&lt;a name="lubm-loader"&gt;LUBM - Loader Performance&lt;/a&gt;&lt;/b&gt;&lt;/font&gt;&lt;/p&gt;

&lt;table style="border-collapse: collapse; border-color: #000000" border="1" cellpadding="5"&gt;
  &lt;tr&gt;
    &lt;th align="right"&gt;LUBM dataset&lt;/th&gt;
    &lt;th align="right"&gt;Triples&lt;/th&gt;
    &lt;th align="right"&gt;Loader 1&lt;/th&gt;
    &lt;th align="right"&gt;Rate&lt;/th&gt;
    &lt;th align="right"&gt;Loader 2&lt;/th&gt;
    &lt;th align="right"&gt;Rate&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align="right"&gt;1000-inf&lt;/td&gt;
    &lt;td align="right"&gt;190,792,744&lt;/td&gt;
    &lt;td align="right"&gt;7,106s&lt;/td&gt;
    &lt;td align="right"&gt;26,849 TPS&lt;/td&gt;
    &lt;td align="right"&gt;3,965s&lt;/td&gt;
    &lt;td align="right"&gt;48,119 TPS&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td align="right"&gt;5000-inf&lt;/td&gt;
    &lt;td align="right"&gt;953,287,749&lt;/td&gt;
    &lt;td align="right"&gt;N/A&lt;/td&gt;
    &lt;td align="right"&gt;N/A&lt;/td&gt;
    &lt;td align="right"&gt;86,644s&lt;/td&gt;
    &lt;td align="right"&gt;11,002 TPS&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;&lt;b&gt;&lt;a name="lubm-sizes"&gt;LUBM - Database sizes&lt;/a&gt;&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Database sizes:&lt;/p&gt;
&lt;table style="border-collapse: collapse; border-color: #000000" border="1" cellpadding="5"&gt;
&lt;tr&gt;
  &lt;th&gt;Dataset&lt;/th&gt;
  &lt;th&gt;Size/loader1&lt;/th&gt;
  &lt;th&gt;Size/loader2&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;1000-inf&lt;/td&gt;
  &lt;td&gt;25GB&lt;/th&gt;
  &lt;td&gt;16GB&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td&gt;5000-inf&lt;/td&gt;
  &lt;td&gt;N/A&lt;/th&gt;
  &lt;td&gt;80GB&lt;/th&gt;
&lt;/tr&gt;
&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-5621962958219212868?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/5621962958219212868/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=5621962958219212868' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/5621962958219212868'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/5621962958219212868'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2010/12/performance-benchmarks-for-tdb-loader.html' title='Performance benchmarks for the TDB loader (version 2)'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-8579801307519683733</id><published>2010-12-09T13:41:00.000Z</published><updated>2010-12-09T13:41:36.329Z</updated><title type='text'>TDB bulk loader - version 2</title><content type='html'>&lt;p&gt;This article could be subtitled called "Good I/O and Bad I/O".

By arranging to use good I/O, the new TDB loader achieves faster

loading rates despite writing more data to disk.  "Good I/O"

is file operations that occurs in a buffered and streaming fashion.

"Bad I/O" is file operations that cause the disk to jump the heads

about randomly or work in small units of disk transfer.

&lt;/p&gt;



&lt;p&gt;

The new &lt;a href="http://openjena.org/TDB"&gt;TDB&lt;/a&gt;

loader "loader2" is a standalone program that bulk loads

data into a new TDB database. It does not support incremental loading,

and may destroy existing data. It has only been tested on Linux; it 

should run on Windows with Cygwin but what the performance will

be is hard to tell.

&lt;/p&gt;



&lt;p&gt;

Figures demonstrating the loader in action for various large datasets

are in a separate blog entry.  It is faster than the current loader

for datasets over about 1 million triples and comes into it's own

above 100 million triples.

&lt;/p&gt;



&lt;p&gt;

Like the current bulk loader ("loader1"), loader2 can load triple and quad RDF formats, 

and from gzipped files.  It runs fastest from N-triples or N-Quads

because the parser is fastest, and low overhead, for these formats. 

&lt;/p&gt;



&lt;p&gt;

The loader is a shell script that coordinates the various phases.

It's available in the TDB development code repository in 

&lt;tt&gt;bin/tdbloader2&lt;/tt&gt; and the current 0.8.8 snapshot build.

&lt;/p&gt;



&lt;p&gt;

Loader2 is based on the observation that the speed of loader1 can

drop sharply as the memory mapped files fill up RAM (the "can" is

because it does not always happen; slightly weird).  This fall off
is more than one would expect simply by having to use some disk and
sometimes the rate of loader1 becomes erratic.  This could be

due to the OS and the management of memory mapped files

but the effect is that the secondary index creation can

become rather slow.  loader1 tends to do "bad I/O" - as the

caches fill up, blocks are written back in what to the disk looks

like a random order causing the disk heads to jump round.

&lt;/p&gt;



&lt;p&gt;

Copying from the primary index to a secondary index involves a sort 

because TDB uses

&lt;a href="http://en.wikipedia.org/wiki/B%2Btree"&gt;B+trees&lt;/a&gt;

for it's triple and quad indexing.  A B+Tree keeps its

records in sorted order and each index is different orders.

&lt;/p&gt;



&lt;p&gt;Loader1 is much faster than simply loading all indexes at once because

in that case there is some much RAM being used for caching of parts of all the

indexes. Better is to do one index at a time, using the RAM for caching one

index at a time.

&lt;/p&gt;



&lt;p&gt;Loader2 similarly has an data loading phase and an index creation phase.&lt;/p&gt;



&lt;p&gt;

The first phase is to build the node table and write out the data for index building.

Loader2 takes the stream of triples and quads

from the parser and writes out the RDF terms (IRI, Literal, blank node)

into the internal node table.

It also writes out text files of tuples of NodeId (the internal 64 bit

number used to identify each RDF term. This is "good I/O" - the writes

of the tuples files are buffered up and the files are written append-only.

This phase is a Java program, which exits after the node table and working files have been written.
&lt;/p&gt;



&lt;p&gt;

The next phase is to produce the indexes, including the primary index.  Unlike

loader1, loader2 does not write the primary index during node loading.

Experimentation showed it was quicker to do it separately despite needing more I/O.  
This is slightly strange.

&lt;/p&gt;



&lt;p&gt;

To build indexes, loader2 uses the

&lt;a href="http://seaborne.blogspot.com/2010/12/repacking-btrees.html"&gt;B+Tree rebuidler&lt;/a&gt; 

and that requires the data in index-sorted order.  Index rebuilding is a sort followed

by B+tree building.  The sort is done by &lt;a href="http://en.wikipedia.org/wiki/Sort_%28Unix%29"&gt;Unix sort&lt;/a&gt;.

Unix sort is very easy to use and it smoothly scales from a few lines to gigabytes of data.

Having written the tuple data out as text files in the first phase (and fixed width hex numbers at that - quite wasteful)

Unix sort can do a text sort on the files.  Despite that meaning lots of I/O, it's good I/O

and the sort program really knows how to best manage temporary files.

&lt;/p&gt;



&lt;p&gt;

For each index, a Unix sort is done to get a temporary file of tuple data in the right sort order.

The B+Tree rebuilder is called with this file as the stream of sorted data it needs to

create an index.

&lt;/p&gt;



&lt;p&gt;

There are still opportunities to tune the new loader and to see if the output of the sorts

being piped directly into the rebuilder is better or worse than the two step approach

using temporary file used at the moment.  Using different disks for different temporary files

should also help.

&lt;/p&gt;



&lt;p&gt;

The index building phase is parallelisable. Because I/O and memory usage are the bottlenecks, not CPU cycles,
the crossover point for this to become effective might be quite high.

&lt;/p&gt;



&lt;p&gt;

To find out whether loader2 is better than loader1, I've run a number of tests. 

Load and query tests with the &lt;a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/V5/index.html"&gt;Berlin SPARQL Benchmark (2009 version)&lt;/a&gt;, a load test on the RDF version of

&lt;a href="http://data.gov.uk/dataset/coins"&gt;COINS&lt;/a&gt; (UK Treasury Combined Online Information System - about 420 million quads and it's real data) and a load test using the &lt;a href="http://swat.cse.lehigh.edu/projects/lubm/"&gt;Lehigh University Benchmark&lt;/a&gt; with some inferencing.  Details, figures and tables in the next article.

&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-8579801307519683733?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/8579801307519683733/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=8579801307519683733' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/8579801307519683733'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/8579801307519683733'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2010/12/tdb-bulk-loader-version-2.html' title='TDB bulk loader - version 2'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-5992332643212350594</id><published>2010-12-03T19:29:00.000Z</published><updated>2010-12-03T19:29:19.213Z</updated><title type='text'>Repacking B+Trees</title><content type='html'>&lt;p&gt;
&lt;a href="http://openjena.org/TDB"&gt;TDB&lt;/a&gt; uses
&lt;a href="http://en.wikipedia.org/wiki/B%2Btree"&gt;B+trees&lt;/a&gt;
for it's triple and quad indexing. 
&lt;/p&gt;

&lt;p&gt;
The indexes hold 3 or 4 NodeIds, where a NodeId is a fixed length 64 bit unique
number for each RDF term in the database. Numbers, dates and times are encoded directly
into the 64 bits where possible, otherwise the NodeId refers to the location in a separate NodeId to RDF term table like all other types,including IRIs.
&lt;/p&gt;

&lt;p&gt;
The B+Trees have a number of leaf blocks, each of which holds only records (key, value pairs, except there's no "value" part in a triple index - just the key of S,P and O in various orders).
TDB threads these blocks together so that a scan does not need to touch the
rest of the tree - scans happen when you look up, say S?? for known subject and unknown property and object.
The scan returns all the triples with a particular S. Counting all the triples only touches the leaves of the B+Tree, not the branches.
&lt;/p&gt;

&lt;p&gt;
B+Trees provide performant indexing over a wide range of memory situations,
ranging from very little caching of disk structures in memory, through to 
being able to cache substantial portions of the tree.
&lt;/p&gt;

&lt;p&gt;
The TDB B+Trees have a number of
&lt;a href="http://en.wikipedia.org/wiki/Block_%28data_storage%29"&gt;block storage layers&lt;/a&gt;;
an in-JVM block caching for use on 32 bit JVMs, memory mapped files, 
for 64 bit JVMs, and an in-memory RAM-disk for testing. The in-memory RAM disk is
not efficient but it is a very good simulation of a disk - it 
really does copy the blocks used by a client when written to another area
so there is no possibility of updating blocks through references held by the
client after the block has been written to "disk".
&lt;/p&gt;

&lt;p&gt;
However, one disadvantage can be that the index isn't very well packed. The B+Trees
guarantee that each block is at least 50% full. In practice, the blocks are 60-70% full for indexes POS and OSP.
But a worse case can arise happens when inserting into the SPO index because data typically arrives with all the triples for one subject, then all the triples for another subject, meaning the data is nearly sorted. While this makes the processing faster, it makes the resulting B+Tree about 50%-60% packed.
&lt;/p&gt;

&lt;p&gt;
Packing density matters because it influences how much of the tree is cached in a fixed amount of computer memory. If it's 50% packed, then it's only 50% efficient in the cache.
&lt;/p&gt;

&lt;p&gt;
There are various ways to improve on this (compress blocks,
&lt;a href="http://en.wikipedia.org/wiki/B_sharp_tree"&gt;B#Trees&lt;/a&gt;,
and many more besides - B-tree variations are very extensively studied data-structures).
&lt;/p&gt;

&lt;p&gt;
I have been working on a B+Tree repacking programme that takes an existing B+Tree and produces a maximumally packed B+Trees. The database is then smaller on disk and the in-memory caches are more efficiently used. The trees produces are legal B+Trees, and have a packing density of close to 100%. Rebuilding indexes is fast and scales linearly.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;The Algorithm&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;
Normally, B+Trees grow at the root. A B+tree is the same depth everywhere in the tree and the tree
only gets deeper if the root node of the tree is split and a new root is created pointing to down the two blocks formed by splitting the old root. This algorithm, building a
tree from a stream of records, grows the tree from the leaves towards the root.
While the algorithm is running there isn't a legal tree - it's only when the algorithm
finishes, does a legal B+Tree emerge.
&lt;/p&gt;

&lt;p&gt;
All the data of a B+tree resides in the leaves - the branches above tell you
which leaf block to look in (this is the key difference between B-Trees and B+Trees).
The first stage of repacking takes a stream of records (key and value) from the initial tree. 
This stream will be in sorted order because it's being read out of a B+Tree and 
for a TDB B+tree, it's a scan tracing the threading of the leaf blocks together.
In other words, it's not memory intensive.
&lt;/p&gt;

&lt;p&gt;
In the first stage, new leaf blocks are produces, one at a time. A block is filled
completely, a new block allocated, the threading pointers completed and the full
block written out. In addition, the block number and highest key in the block are emitted.
The leaf block is not touched again.
&lt;/p&gt;

&lt;p&gt;The exception is the last two blocks of the leaf layer. A B+Tree must have blocks
at least 50% full to be a legal tree. Although the TDB B+Tree code can cope with blocks
that are smaller than the B+tree guarantee, it's neater to rebalance the last two blocks in the case the last block is below the minimum size. Because the second-to-last block is
completely full, it's always possible to rebalance in just two blocks.
&lt;/p&gt;

&lt;p&gt;
Phase two takes as input the stream of block number and highest key from the level below
and builds branch nodes for the B+Tree pointing, by block number, to the blocks produced
in the phase before. When a block is finished, the block can be written out
and a block number and split key emitted. This split key isn't the highest key
in the block - it's the highest key of the entire sub-tree at that point
but this the key passed. A B+tree branch node has N block pointers and N-1
keys and the split key is the last key from making the full block, and is
the Nth key from below.
&lt;/p&gt;

&lt;p&gt;
Once again, the last two blocks are rebalanced to maintain then B+Tree invariant of all blocks
being at least half full. For large trees, there are quite a few blocks, so the rebalance of 
just two of them is insignificant. For small trees, it not really worth repacking the tree - 
block caching at runtime hides any advantages there might be.
&lt;/p&gt;

&lt;p&gt;
The second phase is repeated applied to the block number and split key stream from the layer below
until a layer in the tree is only one block (it can't be zero blocks). This single block
is the new root block. The third phase is to write out the B+Tree details to disk
and put the root block somewhere where it can be found when the B+Tree is reopened.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Consequences&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;
The repacking algorithm produces B+Trees that are the approaching half the size of the
original trees. For a large dataset, that's several gigabytes.
&lt;/p&gt;

&lt;p&gt;
The repacked trees perform a bit faster than trees formed by normal use except
in one case where they are faster. If the tree is small, the majority fits in the RAM caches,
then repacking means less RAM is used but the speed is much the same (in fact as few percent slower,
hard to measure but less than 5%, presumably because there is a difference ratio of tree decent and in-block binary search being done by the CPU.
This may be no more than a RAM cache hierarchy effect).
&lt;/p&gt;

&lt;p&gt;
However, if the tree was large, and repacked now fits mostly in memory, the repacked trees are faster.
As the indexes for an RDF dataset grows much large than the cacheable space, then this effect
slowly declines. Some figures to show this are in preparation.
&lt;/p&gt;

&lt;p&gt;
The biggest benefit however, is not directly the speed of access or the reduced disk space.
It's the fact here is a fast and linear growth way to build a B+Tree from 
a stream of sorted records. It's much faster than simply using the
regular insertion into the B+Tree.
&lt;/p&gt;

&lt;p&gt;
This is part of the new bulk loader for TDB. It uses external sorting to
produce the input to index creation using this B+Tree repacking algorithm.
The new bulk loader can save hours on large data loads.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-5992332643212350594?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/5992332643212350594/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=5992332643212350594' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/5992332643212350594'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/5992332643212350594'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2010/12/repacking-btrees.html' title='Repacking B+Trees'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-7422183453004462231</id><published>2010-08-28T22:29:00.001+01:00</published><updated>2010-08-28T22:31:01.347+01:00</updated><title type='text'>Migrating from the SPARQL Update submission language to the emerging SPARQL 1.1. Update  standard</title><content type='html'>&lt;p&gt;
&lt;a href="http://www.w3.org/TR/sparql11-update/"&gt;SPARQL 1.1 Update&lt;/a&gt; is work-in-progress by
the &lt;a href="http://www.w3.org/2001/sw/DataAccess/"&gt;SPARQL Working Group&lt;/a&gt; but the general
design and language is reasonably stable. There is also the W3C submission
&lt;a href="http://www.w3.org/Submission/SPARQL-Update/"&gt;SPARQL Update&lt;/a&gt; from July 2008.
The language are similar in style but the details of the grammars differ.
So how to migrate from the syntax used in the submission to the upcoming SPARQL recommendation
for a SPARQL Update language?
&lt;/p&gt;

&lt;p&gt;
One way is to provide both languages behind a common API, with the application indicating which language
to use.  This maximises compatibility because if the submission is the chosen language, the parser for
the submission language will be used.  But the application has to be changed to move between the languages and conversion of update scripts has to be done for each script, so probably it's a "big bang" change over. The two languages are very close - is it possible to have a single language that covers both languages?  
Then the application can mix usages and when an update request is printed it can be printed in the soon-to-be standard language, helping people see how the language has changed.
&lt;/p&gt;

&lt;p&gt;It turns out that most, but no all, the submission language can be incorporated into 
the grammar for the emerging standard. The cases not covered don't seem to be ones likely
to be widely used although it would be good to know if they are.
&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;&lt;tt&gt;CREATE&lt;/tt&gt;, &lt;tt&gt;CLEAR&lt;/tt&gt;, &lt;tt&gt;LOAD&lt;/tt&gt;, &lt;tt&gt;DROP&lt;/tt&gt; are covered.&lt;/li&gt;

&lt;li&gt;&lt;tt&gt;INSERT DATA&lt;/tt&gt;, &lt;tt&gt;DELETE DATA&lt;/tt&gt; on the default graph covered or working on one a named graph
is covered but not on more than one graph at once.&lt;/li&gt;

&lt;li&gt;An extra grammar rule for &lt;tt&gt;MODIFY&lt;/tt&gt; is supported, again working on the default graph or one named graph. but with only a single, optional &lt;tt&gt;GRAPH &amp;lt;uri&amp;gt;&lt;/tt&gt;.

&lt;li&gt;The old style &lt;tt&gt;INSERT { :s :p :o }&lt;/tt&gt;, &lt;tt&gt;DELETE { :s :p :o }&lt;/tt&gt;, that is, insert or delete some data using just the &lt;tt&gt;INSERT&lt;/tt&gt; or &lt;tt&gt;DELETE&lt;/tt&gt; keyword, without &lt;tt&gt;DATA&lt;/tt&gt;, leads to ambiguity in the combined grammar.  These forms are not supported in the combined language. In fact, these forms pre-date the &lt;tt&gt;DATA&lt;/tt&gt; forms in the submission language.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;
The ability to work on only one named graph needs a little explanation. In the
combined grammar, the &lt;tt&gt;INTO&lt;/tt&gt; or &lt;tt&gt;FROM&lt;/tt&gt;
is used to set the &lt;tt&gt;WITH&lt;/tt&gt; part of an update. 
There can be at most one &lt;tt&gt;WITH&lt;/tt&gt;. In the submission,
&lt;/p&gt;
&lt;pre&gt;
INSERT INTO &amp;lt;g1&amp;gt; &amp;lt;g2&amp;gt; &amp;lt;g3&amp;gt; { ... } WHERE { ... }
&lt;/pre&gt;
&lt;p&gt;is legal.  In terms of language, this could be incorporated into the extended language but it introduces a capability not present in the upcoming working group language and it can't be written out again without repeating the operation, once for each named graph.  Operating on a single named graph, or the default graph, is covered by the standard.&lt;/p&gt;

&lt;p&gt;For old style &lt;tt&gt;INSERT&lt;/tt&gt; or &lt;tt&gt;DELETE&lt;/tt&gt; of data, conversion can be done by adding in the word DATA to the operation or adding WHERE {} to the update operation.  Both these conversions yield something that is legal and the same under the submission language so the conversation can be done and retain the use of old software.&lt;/p&gt;

&lt;p&gt;In summary: The accepted forms of the submission language are:
&lt;pre&gt;
  INSERT [INTO &amp;lt;uri&amp;gt;] {...} WHERE {...}
  DELETE [FROM &amp;lt;uri&amp;gt;] {...} WHERE {...}
  INSERT DATA [INTO &amp;lt;uri&amp;gt;] {...}
  DELETE DATA [FROM &amp;lt;uri&amp;gt;] {...}
&lt;/pre&gt;

&lt;p&gt;By using an extended grammar, the application can even mix syntax of the submission on SPARQL Update and SPARQL 1.1 Update in a single request or, indeed, single operation.  When printed the output can be in the equivalent SPARQL 1.1 Syntax.&lt;/p&gt;

&lt;p&gt;ARQ (currently, the development snapshot) includes a command line SPARQL 1.1 Update extended parser, "arq.uparse". arq.uparse reads the extended syntax and prints the equivalent strict SPARQL 1.1 Update form.  It can be used to translate from the submission language to W3C standards language. More on practical details: &lt;a href="http://tech.groups.yahoo.com/group/jena-dev/message/45040"&gt;jena-dev/message/45040&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Key points from the extended Grammar: The working group is not planning on including this published SPARQL 1.1 Update grammar.&lt;/p&gt;

&lt;pre&gt;
UpdateUnit  :=  Prologue Update &amp;lt;EOF&amp;gt;

Update  :=  ( Update1 )+

# As for SPARQL 1.1 Update with addition of "ModifyOld"
Update1 :=  ( Load | Clear | Drop | Create |
              InsertData | DeleteData | DeleteWhere |
              Modify | ModifyOld )
            ( &amp;lt;SEMICOLON&amp;gt; )?

Load    :=  &amp;lt;LOAD&amp;gt; IRIref ( &amp;lt;INTO&amp;gt; ( &amp;lt;GRAPH&amp;gt; )? IRIref )?

Clear   :=  &amp;lt;CLEAR&amp;gt; ( &amp;lt;SILENT&amp;gt; )? GraphRefAll

Drop    :=  &amp;lt;DROP&amp;gt; ( &amp;lt;SILENT&amp;gt; )? GraphRefAll

Create  :=  &amp;lt;CREATE&amp;gt; ( &amp;lt;SILENT&amp;gt; )? GraphRef

InsertData  :=  &amp;lt;INSERT_DATA&amp;gt; OptionalIntoTarget QuadPattern

DeleteData  :=  &amp;lt;DELETE_DATA&amp;gt; OptionalFromTarget QuadData

DeleteWhere :=  &amp;lt;DELETE_WHERE&amp;gt; QuadPattern

Modify  :=  ( &amp;lt;WITH&amp;gt; IRIref )?
            ( DeleteClause ( InsertClause )? | InsertClause )
            ( UsingClause )*
            &amp;lt;WHERE&amp;gt; GroupGraphPattern

# The MODIFY form from the submission
ModifyOld   :=  &amp;lt;MODIFY&amp;gt; ( IRIref )?
                ( DeleteClause )?
                ( InsertClause )?
                &amp;lt;WHERE&amp;gt; GroupGraphPattern

DeleteClause    :=  &amp;lt;DELETE&amp;gt; OptionalFromTarget QuadPattern

InsertClause    :=  &amp;lt;INSERT&amp;gt; OptionalIntoTarget QuadPattern

# Optional INTO: wraps the QuadPattern with a GRAPH
OptionalIntoTarget  :=  ( ( &amp;lt;INTO&amp;gt; )? IRIref )?

# Optional FROM; wraps the QuadPattern with a GRAPH
OptionalFromTarget  :=  ( ( &amp;lt;FROM&amp;gt; )? IRIref )?

UsingClause :=  &amp;lt;USING&amp;gt; ( IRIref | &amp;lt;NAMED&amp;gt; IRIref )
&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-7422183453004462231?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/7422183453004462231/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=7422183453004462231' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/7422183453004462231'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/7422183453004462231'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2010/08/migrating-from-sparql-update-submission.html' title='Migrating from the SPARQL Update submission language to the emerging SPARQL 1.1. Update  standard'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-6242907008544496537</id><published>2010-07-30T10:42:00.003+01:00</published><updated>2010-07-30T12:01:33.751+01:00</updated><title type='text'>Moving to Epimorphics</title><content type='html'>&lt;p&gt;I'm moving to &lt;a href="http://www.epimorphics.com/"&gt;Epimorphics&lt;/a&gt;, starting there early next month. Epimorphics is now located in &lt;a href="http://en.wikipedia.org/wiki/Portishead,_Somerset"&gt;Portishead&lt;/a&gt; (as of &lt;a href="http://reference.data.gov.uk/id/day/2010-07-27"&gt;last Tuesday&lt;/a&gt;).&lt;/p&gt;


&lt;p&gt;As before, I will still be able to work on &lt;a href="http://openjena.org/"&gt;Jena&lt;/a&gt;, &lt;a href="http://openjena.org/ARQ"&gt;ARQ&lt;/a&gt; and &lt;a href="http://openjena.org/TDB"&gt;TDB&lt;/a&gt; and I also get to continue participating in the &lt;a href="http://www.w3.org/2009/sparql/wiki/Main_Page"&gt;W3C SPARQL working group&lt;/a&gt;, now as an Invited Expert.  The working group is making good progress on it's chosen &lt;a href="http://www.w3.org/TR/sparql-features/"&gt;list of features&lt;/a&gt;, and now it's just a "small" matter of doing the core work and getting out the Last Call documents to the community.
&lt;/p&gt;
&lt;p&gt;
More exciting times.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-6242907008544496537?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.epimorphics.com/' title='Moving to Epimorphics'/><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/6242907008544496537/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=6242907008544496537' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/6242907008544496537'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/6242907008544496537'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2010/07/moving-to-epimorphics.html' title='Moving to Epimorphics'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-3639590503256753015</id><published>2010-07-17T20:50:00.000+01:00</published><updated>2010-07-17T20:50:31.996+01:00</updated><title type='text'>Ubuntu on a Samsung N210</title><content type='html'>&lt;p&gt;I have Ubuntu 10.04 working on a Samsung N210, running Thunderbird, Firefox as well as all my Java development systems. It may not be a fast machine but it's very convenient. The process is now easy, easier than some older material (for 9.10 and very early 10.04) on the web might suggest.&lt;p&gt;

&lt;p&gt;When first turned on, the machine installed Windows 7 starter.  I let this finish even though I didn't want it so I could install Ubuntu 10.04 along side Windows in case it didn't work.  Once I was happy it would work, I repartitioned the disk (with gparted) to create a single partition, deleting Windows and the restore partition, then reinstalled.&lt;/p&gt;

&lt;p&gt;First, build a USB drive with the install on.  To get the machine to boot fro this I had to:

&lt;ul&gt;
&lt;li&gt;As the machine boots, keep F2 pressed to go into the BIOS.&lt;/li&gt;
&lt;li&gt;Make sure the machine will boot from a USB pendrive.&lt;/li&gt;
&lt;li&gt;Reboot with USB and install Ubuntu Netbook Remix&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You have to press F2 very early to get into the BIOS configuration screens. The boot through the BIOS is very fast so don't wait for machine to put the Samsung flash screen up.&lt;/p&gt;

&lt;p&gt;You can reset the BIOS to not boot from USB if you want to at this stage, or later.&lt;/p&gt;

&lt;p&gt;At this point the wireless does not work.  &lt;b&gt;Don't panic&lt;/b&gt;; plug in an Ethernet cable and update the system.&lt;/p&gt;

&lt;blockquote&gt;&lt;pre&gt;sudo apt-get update
sudo apt-get upgrade
sudo reboot
&lt;/pre&gt;&lt;/blockquote&gt;

&lt;p&gt;and now the wireless works.  There's quite a lot of advice on the web about this but it now seems that there is no need for any custom software - looks like the main Ubuntu repositories have a working version of the system.&lt;/p&gt;

&lt;p&gt;To get the function keys working I followed the advice in &lt;a href="https://bugs.launchpad.net/ubuntu/+bug/574250"&gt;https://bugs.launchpad.net/ubuntu/+bug/574250&lt;/a&gt;.

&lt;blockquote&gt;&lt;pre&gt;The missing function keys are due to the fact that Samsung N150/N210/N220 are missing from the udev rules:

/lib/udev/rules.d/95-keymap.rules
/lib/udev/rules.d/95-keyboard-force-release.rules

adding "|*N150/N210/N220*" to the product part of the rules for Samsung in BOTH files, will enable the Fn-up and Fn-down keys. The new product section will look like:

ENV{DMI_VENDOR}=="[sS][aA][mM][sS][uU][nN][gG]*", ATTR{[dmi/id]product_name}=="*NC10*|*NC20*|*N130*|*SP55S*|*SQ45S70S*|*SX60P*|*SX22S*|*SX30S*|*R59P/R60P/R61P*|*SR70S/SR71S*|*Q210*|*Q310*|*X05*|*P560*|*R560*|*N150/N210/N220*"

Now, you can map these keys to any program setting the backlight
&lt;/pre&gt;&lt;/blockquote&gt;

&lt;p&gt;and then install some Samsung tools - you need to add the repository to the package manager which you can do graphically or as:&lt;/p&gt;

&lt;blockquote&gt;&lt;pre&gt;sudo add-apt-repository ppa:voria/ppa
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install samsung-tools samsung-backlight
sudo reboot&lt;/pre&gt;&lt;/blockquote&gt;

&lt;p&gt;at which point the N210 works a treat.&lt;/p&gt;

&lt;p&gt;Now - remove all the Windows stickers on the machine, front and back.&lt;/p&gt;

&lt;p&gt;If you are looking for software to try, &lt;a href="http://blog.thesilentnumber.me/2010/04/ubuntu-1004-post-install-guide-what-to.html"&gt;this blog&lt;/a&gt; is a good place to start.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-3639590503256753015?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/3639590503256753015/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=3639590503256753015' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/3639590503256753015'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/3639590503256753015'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2010/07/ubuntu-on-samsung-n210.html' title='Ubuntu on a Samsung N210'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-2552263859974412692</id><published>2010-06-02T10:57:00.003+01:00</published><updated>2010-12-31T16:51:20.150Z</updated><title type='text'>Standardising RDF Syntaxes</title><content type='html'>&lt;p&gt;
One area of interest at the &lt;a href="http://www.w3.org/2009/12/rdf-ws/"&gt;RDF Next Steps Workshop&lt;/a&gt; is 
other RDF-related syntaxes, ones that are not RDF/XML.  &lt;a href="http://www.w3.org/TR/REC-rdf-syntax/"&gt;RDF/XML&lt;/a&gt; is the standard 
syntax; &lt;a href="http://www.w3.org/TR/rdf-testcases/#ntriples"&gt;N-Triples&lt;/a&gt; is defined as part of the RDF test suite but not formally as a syntax on the same level as RDF/XML; there is &lt;a href="http://www.w3.org/TR/rdfa-syntax/"&gt;RDFa&lt;/a&gt; for embedding in XHTML.
&lt;/p&gt;
&lt;p&gt;
RDF/XML is not easy to read as RDF.  &lt;a href="http://www.w3.org/TeamSubmission/turtle/"&gt;Turtle&lt;/a&gt; appeals because it more clearly shows the triple structure of the data.  &lt;a href="http://sw.deri.org/2008/07/n-quads/"&gt;N-Quads&lt;/a&gt; is a proposal to extend RDF file format to named graphs and &lt;a href="http://www4.wiwiss.fu-berlin.de/bizer/TriG/"&gt;TriG&lt;/a&gt; is a Turtle-inspired named graph syntax.  There is &lt;a href="http://www.hpl.hp.com/techreports/2004/HPL-2004-56.html"&gt;TriX&lt;/a&gt; but I've never come across that in the wild.
&lt;/p&gt;
&lt;p&gt;
Using XML had several advantages, such as comprehensive character set support, neutrality of format 
and reuse of parsers.  However, it's complicated in it's entirety, even after using an XML parser and it is quite expensive to parse, making parsing large (and not some large) files a significant cost.  Because it can't, practically, be processed by XSLT there are 
nowadays few advantages.
&lt;/p&gt;
&lt;p&gt;
All the non-XML formats, which are much easier to read and process, would be good to standardise but they are not without the need for sorting out some details.
Details matter when you're dealing with anything over a trivial amount of data 
and when's it's millions of triples, it's just a friction point to get the data 
cleaned up if there is disagreement between information publisher and 
information consumer.&lt;/p&gt;
&lt;p&gt;
&lt;p&gt;&lt;b&gt;Turtle&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;
Turtle takes the approach of using &lt;a href="http://en.wikipedia.org/wiki/UTF-8"&gt;UTF-8&lt;/a&gt; as the character set, rather than relying on character set control like XML.  Given that nowadays UTF-8 support is well understood and widely available, the internationalization issues of different scripts are best dealt with that way.  Parsers are both simple to write and fast. 
(The tricks needed to get Java to parser fast would be a subject for a separate discussion.)&lt;/p&gt;
&lt;p&gt;
As Turtle is the more mature of the possible syntaxes, it is also the best 
worked out. One issue I see is the migration from a one-standard-syntax world to a two-standard-syntax world 
and it's not without its practical problems.  What if system A speaks RDF/XML only, and system B speaks only Turtle?  How long will it take for uses of content negotiation take to catch up? Going from 
V-nothing to V1 of a system (which is where we are now) is usually quicker than 
going from V1 to V2 as the need to upgrade is much less.  If it ain't broke why change? 
&lt;/p&gt;
&lt;p&gt;
Turtle can write graphs that RDF/XML can't encode.  If the property can't be split into namespace and local name, then RDF/XML can't represent it.  An XML qname must have a local part of at least one alphabetic character. This isn't common but these details arise and cause problems (that is, costs) when exchanging data at scale.
&lt;/p&gt;
&lt;p&gt;
What would be useful would be a set of language tokens to build all sorts of languages, like rule languages but at the moment there some unnecessary restrictions in Turtle on prefixed name (Turtle calls them qnames but they are not exactly XML &lt;a href="http://www.w3.org/TR/REC-xml-names/#ns-qualnames"&gt;qnames&lt;/a&gt;).
&lt;/p&gt;
&lt;p&gt;
Turtle disallows:
&lt;/p&gt;
&lt;pre&gt;employee:1234&lt;/pre&gt;
&lt;p&gt;
because the local part starts with a digit.  In data converted from existing (non-RDF) data this is a nuisance, and one that caused SPARQL to allow it, based on community feedback.
&lt;/p&gt;
&lt;p&gt;
But there are other forms that can be useful that are not allowed (and aren't in SPARQL):
&lt;/p&gt;
&lt;pre&gt;ex:xyz#abc&lt;/pre&gt;
&lt;pre&gt;ex:xyz/abc&lt;/pre&gt;
&lt;pre&gt;ex:xyz?parm=value&lt;/pre&gt;
&lt;p&gt;
The last one might be a bit extreme but the first two or just using the prefix 
to tidy up long IRIs. Partial alignment with XML qnames makes no sense in Turtle.  Extending the range of characters to include &lt;code&gt;/&lt;/code&gt;, &lt;code&gt;#&lt;/code&gt; and maybe a few others, makes prefixed names more useful.  Issues just like this lead to the &lt;a href="http://www.w3.org/TR/curie/"&gt;CURIE syntax&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
While these URIs can be written in Turtle, it needs the long form, with &lt;code&gt;&amp;lt;...&amp;gt;&lt;/code&gt;, and the only way to abbreviate is via the base 
IRI, but you can only have one base URI. It's a workaround really that gets ugly 
when the advantage of Turtle is that it is readable. Extending the range of 
characters in the local part does not invalidate old data; it does create 
friction in interoperability so we have one last chance to sort this out if 
Turtle is to be standardised.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;N-Quads&lt;/b&gt;&lt;/p&gt;
&lt;pre&gt;&amp;lt;s&amp;gt; &amp;lt;p&amp;gt; &amp;lt;o&amp;gt; .&lt;/pre&gt;
&lt;pre&gt;&amp;lt;s&amp;gt; &amp;lt;p&amp;gt; &amp;lt;o&amp;gt; &amp;lt;g&amp;gt; .&lt;/pre&gt;

&lt;p&gt;
What could be simpler? N-Quads is N-Triples with an optional 4th field to give the graph name (or context - it wasn't designed specifically for named graphs, but let's just consider 
IRIs in the 4th field, not blank nodes or literals which the syntax allows).
&lt;/p&gt;
&lt;p&gt;
But TriG puts the graph name before the triples, while N-Quads puts it after.  Maybe N-Quads should be like TriG so that TriG can make N-Quads a subset.  Parsing this modified N-Quads only takes buffing of the tokens on the line and counting to 3 or 4 to determine if it's a triple or a quad.  Making TriG more flexible, at the cost of the slightly less intuitive graph name first, in what is basically a dump format, seems to me to be a good trade-off.
&lt;/p&gt;
&lt;p&gt;
Blank nodes labels need to be clarified - is the scope the graph or the document? Both are workable.  I'd choose scope-to-the-document, if only to avoid the confusion of two identical labels referring to to different bnodes, and it's occasionally useful to say that a bnode 
in one graph really is the same as another when using it as a transfer syntax 
(for example, when one graph is a subgraph of another). TriG has the same issue but the use of nested forms for graphs makes scoped-graph more reasonable (except that graphs can be split over different &lt;code&gt;{} &lt;/code&gt;blocks).  Doing the same in N-Quads and TriG is important, and my preference is document-scoped labels.
&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TriG&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;
TriG is a Turtle-like syntax for named graphs.  It is useful for writing down &lt;a href="http://www.w3.org/TR/rdf-sparql-query/#rdfDataset"&gt;RDF datasets&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
It has some quirks though.  Turtle is not a subset of TriG because the default graph needs to be wrapped in &lt;code&gt;{}&lt;/code&gt; but the prefixes need to 
be outside the &lt;code&gt;{}&lt;/code&gt;.  The default graph needs to be given in a single block, but named graphs can be fragmented (that was just an oversight in the spec).   It would be helpful to allow the unnamed graph be specificed as Turtle and similarly if an N-Quads file were legal TriG.
&lt;/p&gt;
&lt;p&gt;
TriG allows the N3-ish form:
&lt;/p&gt;
&lt;pre&gt;&amp;lt;g&amp;gt; = { ... } .&lt;/pre&gt;
&lt;p&gt;
I've seen some confusion about this form in the &lt;a href="http://data.gov.uk/"&gt;data.gov.uk&lt;/a&gt; data.  The addition "=" and ".", which are optional, cause confusion and at least one parser does not accept them as it wasn't expected.
&lt;/p&gt;
&lt;p&gt;
In N3, &lt;code&gt;=&lt;/code&gt; is a synonym for owl:sameAs but the relationship isn't likely to be owl:sameAs, read as N3, it's more likely to be &lt;a href="http://www.w3.org/2000/10/swap/doc/Reach"&gt;log:semantics&lt;/a&gt;.  Now I like the uniformity of the N3 data model, with graph literals (formulae) because of the simplicity and completeness it introduces but it's not RDF, it's an extension and it breaks all RDF-only systems.
&lt;/p&gt;
&lt;p&gt;
If &amp;lt;g&amp;gt; is the IRI of a graph document, it would be more like the N3:
&lt;/p&gt;
&lt;pre&gt; &amp;lt;g&amp;gt; log:semantics { ... } .&lt;/pre&gt;
&lt;p&gt;
or 
&lt;/p&gt;
&lt;pre&gt;
&amp;lt;g&amp;gt; log:semantics ?v .
?v owlSameAs { ... } .
&lt;/pre&gt;
&lt;p&gt;
Avoiding the variability of syntax, which brings no benefit, is better. Drop the 
optional adornment.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Summary&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;
None of these issues are roadblocks; they are just details that need sorting out to move from the current 
&lt;i&gt;de facto&lt;/i&gt; formats to specifications. When exchanging data between systems 
that are not built together, details matter.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-2552263859974412692?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/2552263859974412692/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=2552263859974412692' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/2552263859974412692'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/2552263859974412692'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2010/06/standardising-rdf-syntaxes.html' title='Standardising RDF Syntaxes'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-1766675741312576170</id><published>2009-12-14T15:29:00.002Z</published><updated>2009-12-14T15:31:51.336Z</updated><title type='text'>Running TDB on a cloud storage system</title><content type='html'>&lt;p&gt;
&lt;a href="http://project-voldemort.com/"&gt;Project Voldemort&lt;/a&gt; is an open-source (Apache2 license) distributed, scalable, fault-tolerant,  key-value storage system for large scale data.  Being a key-value store the only operations it provides are:
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;get(key) -&amp;gt; value&lt;/li&gt;
&lt;li&gt;put(key, value)&lt;/li&gt;
&lt;li&gt;delete(key)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key and value can be various custom types but at the lowest level, they are arrays of bytes.  Serialization schemes on top of byte arrays given structure but access is only via the key (so no filters or joins as part of the store.  It's built for scale and speed, and fault tolerance.&lt;/p&gt;

&lt;p&gt;TDB has internal APIS so that difefrentindexing scheme or different stroage technologies can be plugged in. A key-value store can be used as the storage layer for TDB.  
&lt;/p&gt;

&lt;p&gt;There are two areas of storage needed: the node table (a 2-way mapping between the data making up the RDF terms and the associated, fixed size NodeId) and the indexes, which provide the matching of triple patterns. See the &lt;a href="http://openjena.org/wiki/TDB/Architecture"&gt;TDB Design Notes&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But a key-value store isn't an ideal backend. The node table is a pair of key-value stores because all that is needed is lookup between RDF term and the NodeId.  The issues that arise are the granularity of access.  TDB heavily caches the mapping in the query engine. 
&lt;/p&gt;

&lt;p&gt;
The indexes don't naturally map to key-value access because looking up a triple pattern results in all matches.  There are (at least) two ways of doing this.  Either store something like all PO pairs and use S as a key (a bit like Jena's memory model), or use the key-value store to hold part of a datastructure and access it like a disk.
&lt;/p&gt;

&lt;p&gt;
TDB uses threaded B+Trees with a pluggable disk block layer (this is used to switch between 32 and 64 bit modes) so the key-value store a block storage is a simple fit.  Because B+Trees store the entries in sorted order, caching means that a block probably contains all the PO for a given S if a look up is by S so these two schemes end up being similar even though at the design level they are quite different.
&lt;/p&gt;

&lt;p&gt;Both apects are relying on the query engine doing caching to work sensibly to compensate for the mismatch in requirements (triple match for joins) and interface granularity (for node access).&lt;/p&gt;

&lt;p&gt;See also:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Taylor Cowan has explored the S-&amp;gt;PO in his work putting 
&lt;a href="http://tech.groups.yahoo.com/group/jena-dev/message/39575"&gt;Jena
on Google App engine&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;a href="http://ig.semanticsupport.org/"&gt;Infinite Graph for Jena&lt;/a&gt; uses the Jena memory model and pages data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
Does Project Voldemort work as storage for TDB? Yes, and with only a small amount of code.  Not surprisingly, the performance is limited in this experimental system (e.g. storing invidiual RDF terms in the node table needs better management to avoid latency and overhead in the round trip to the remote table).  Truncating to only the used space then compressing would be useful on teh indexes (see
&lt;a href="http://www.vldb.org/pvldb/1/1453927.pdf "&gt;RDF-3X&lt;/a&gt; for an interesting compression scheme).  But it's a workable scheme and the style of using a key-value store shows TDB can be ported to a wide variety of environments because key-value stores are currently a very active area - project Voldemort provides a cloud-centric stiorage fabric.&lt;/p&gt;

&lt;p&gt;
I've  started putting experimental systems on github.  This experiment is available in the &lt;a href="http://github.com/afs/TDB-V"&gt;TDB/V repository&lt;/a&gt;.  These are not released, supported systems; they are the source code and development setup (for Eclipse usually). I used Project Voldemort release v0.57. 
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-1766675741312576170?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://github.com/afs/TDB-V' title='Running TDB on a cloud storage system'/><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/1766675741312576170/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=1766675741312576170' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/1766675741312576170'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/1766675741312576170'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2009/12/running-tdb-on-cloud-storage-system.html' title='Running TDB on a cloud storage system'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-3509662869351338066</id><published>2009-10-01T14:47:00.003+01:00</published><updated>2009-12-28T19:49:57.066Z</updated><title type='text'>BSBM-Jena</title><content type='html'>&lt;p&gt;This a version of the
&lt;a href="http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/"&gt;Berlin 
SPARQL Benchmark&lt;/a&gt; tools, modified to allow the query benchmark driver to work 
with a local database, rather than a SPARQL endpoint. I've changed &lt;code&gt;
benchmark.testdriver.TestDriver&lt;/code&gt; to accept a Jena assembler description.&lt;/p&gt;
&lt;pre&gt; TestDriver -runs ... -w ... -idir ...-o ... local:assembler.ttl&lt;/pre&gt;
&lt;p&gt;Git clone
&lt;a href="git://github.com/afs/BSBM-Jena.git"&gt;
git://github.com/afs/BSBM-Jena.git&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In the repository, there are versions of ARQ and TDB with significant performance improvements. It will the run benchmark in a reasonable time now. There are some shell scripts to help run the benchmark 
as well.&lt;/p&gt;

&lt;p&gt;Update:&lt;/p&gt;

&lt;p&gt;The GIT repository has been renamed as "BSBM-Local".  The Jena pseudo URI scheme is now "jena:". The project now includes support for running tests directly on a Sesame native repository using the pseudo URI scheme "sesame:&lt;i&gt;directory&lt;/i&gt;" - this could be easily extended to any Sesame repository implementation.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-3509662869351338066?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://github.com/afs/BSBM-Jena/' title='BSBM-Jena'/><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/3509662869351338066/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=3509662869351338066' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/3509662869351338066'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/3509662869351338066'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2009/10/bsbm-jena.html' title='BSBM-Jena'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-5212287068235135329</id><published>2009-09-14T15:00:00.001+01:00</published><updated>2009-09-14T15:00:59.628+01:00</updated><title type='text'>Moving to Talis</title><content type='html'>&lt;p&gt;I'll be leaving HP soon when the semantic web group winds down.  I'll be joining the the &lt;a href="http://www.talis.com/platform/"&gt;platform division&lt;/a&gt; at Talis, still living in Bristol, working from home much of the time then travelling to Birmingham as needed . This is one of the reasons why Talis is attractive as a place to work is beause they understand such working arrangements.  I'll get to continue support and development of my contributions to Jena.
&lt;/p&gt;
&lt;p&gt;
And the first question I have been getting is about what will happen to Jena.
&lt;/p&gt;
&lt;p&gt;
Jena has a BSD-style licence so there is no block to continuing any use by any users nor continued development by the Jena developers.  But we plan to go further and become an open source project with no one commercial backer.  After discussions with HP, our current plan is to transfer the ownership of the copyright to a commercially neutral body.  In immediate terms, there is no change to people/companies using Jena but this change will make it easier to continue and expand the core developer community and indeed it enables us to accept contributions more easily.  It has been a (cultural, not legal) barrier that HP was seen to have controlling interest despite the fact we have always acted openly.
&lt;/p&gt;
&lt;p&gt;
I also get to continue participating in the &lt;a href="http://www.w3.org/2009/sparql/wiki/Main_Page"&gt;W3C SPARQL working group&lt;/a&gt;. The list of things the WG has decided it has the time and resources to work on is &lt;a href="http://www.w3.org/2009/sparql/wiki/index.php?title=FeatureProposal"&gt;here&lt;/a&gt;.  The charter states that no queries from the spec of 2008 will change.  The new thing for the working group &lt;a href="http://www.w3.org/2009/sparql/track/issues/17"&gt;to address&lt;/a&gt; is update, both as &lt;a href="http://www.w3.org/Submission/SPARQL-Update/"&gt;a language&lt;/a&gt; but also some RESTful operations.
&lt;/p&gt;
&lt;p&gt;
Exciting times.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-5212287068235135329?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/5212287068235135329/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=5212287068235135329' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/5212287068235135329'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/5212287068235135329'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2009/09/moving-to-talis.html' title='Moving to Talis'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-2115284086667883569</id><published>2008-12-27T16:39:00.003Z</published><updated>2008-12-27T16:43:51.691Z</updated><title type='text'>A small mystery about deletion in T-Trees</title><content type='html'>&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/T-tree"&gt;T-Trees&lt;/a&gt; are a generalisation of &lt;a href="http://en.wikipedia.org/wiki/AVL_tree"&gt;AVL trees&lt;/a&gt;.  They are useful for in-memory databases because they have better packing densities than AVL trees and need less rotations.  They provide a sorted index (so they are not good as the only index structure in an in-memory RDF store). This posting is only loosely triple store or SPARQL related.&lt;/p&gt;

&lt;p&gt;The paper &amp;quot;&lt;a href="http://www.vldb.org/conf/1986/P294.PDF"&gt;A Study of Index Structures for Main Memory Database Management Systems&lt;/a&gt;&amp;quot; (Tobin J. Lehman and Michael J. Carey, VLDB 1986) has the details.&lt;/p&gt;

&lt;p&gt;T-Trees keep an array of items per tree node (usually a short array) and have 3 pointers and 2 integers per tree node stored as opposed to 3 pointers, 1 number per single item stored for AVL. (both can be done with 2 pointers, with no parent but it&amp;#39;s more complicated and the code has to run it&amp;#39;s own stack to record it&amp;#39;s path through the tree).  Make the array a few entries long, and a T-Tree is a bit more more compact; rotations only happen when the tree structure after a leaf-array fills up.&lt;/p&gt;

&lt;p&gt;I understand these algorithms in depth if I code them up.  My implementation of T-Trees which includes consistency checking because I find it easier to write data structure algorithms this way - lots of internal checking, lots of test cases and then move to large scale randomized insertion and deletion patterns because I don&amp;#39;t trust myself to enumerate all possibilities in a hand-written test suite.  Run the randomized tests for a few million iterations checking the structure for internal consistency constraints on every insertions and deletion.  Then disable (but don&amp;#39;t remove!) the checking code, and rely on the fact that &amp;quot;if (false)&amp;quot; compiles to nothing in Java and statics tend to get in-lined by the JIT.&lt;/p&gt;

&lt;p&gt;But one area is puzzling me.  The mystery, to me, is in the delete algorithm.  One feature of a T-Tree is that the internal nodes (internal nodes have both a left and right subtree) should always be larger than some minimum amount.&lt;/p&gt;

&lt;p&gt;The delete algorithm, from the paper, is (section 3.2.1):&lt;/p&gt;

&lt;BLOCKQUOTE style="font-family: 'times new roman'"&gt;
&lt;p&gt;3) Delete Algorithm

&lt;/p&gt;

&lt;p&gt;The deletion algorithm is similar to the insertion algorithm in the sense that the element to be deleted is searched for, the operation is performed, and then re-balancing is done if necessary. The algorithm works as follows:

&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Search for the node that bounds the delete value. Search for the delete value within this node, reporting an error and stopping if it is not found.&lt;/li&gt;

&lt;li&gt;If the delete will not cause an underflow (i.e. if the node has more than the minimum allowable number of entries prior to the delete), then simply delete the value and stop; else, if this is an internal node, then delete the value and borrow the greatest lower bound of this node from a leaf or half-leaf to bring this node’s element count back up to the minimum; else, this is a leaf or a half-leaf, so just delete the element. (Leaves are permitted to underflow, and half-leaves are handled in step 3.&lt;/li&gt;

&lt;li&gt;If the node is a half-leaf and can be merged with a leaf, coalesce the two nodes into one node (a leaf) and discard the other node. Proceed to step 5.&lt;/li&gt;

&lt;li&gt;If the current node (a leaf) is not empty, then stop; else, free the node and proceed to step 5 to re-balance the tree.&lt;/li&gt; 

&lt;li&gt;For every node along the path from the leaf up to the root, if the two subtrees of the node differ in height by more than one, perform a rotation operation. Since a rotation at one node may create an imbalance for a node higher up in the tree, balance-checking for deletion must examine all of the nodes on the search path until a node of even balance is discovered.&lt;/li&gt;
&lt;/ol&gt;
&lt;/BLOCKQUOTE&gt;

&lt;p&gt;But what if a half-leaf, a node with just one sub-node as a leaf, becomes less than the minimum size of a node, yet can not be merged with it&amp;#39;s leaf?  The t-tree is still valid - the size constraint is on internal nodes only.  But a half-leaf can be become an internal node by a rotation of the tree.  So our less-than-min-sized half-leaf can become an invalid internal node and the constraint on internal nodes is validated.

&lt;/p&gt;

&lt;p&gt;Several possible solutions ocurr to me (and probably more than just these):

&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Just don&amp;#39;t worry (works for me because I&amp;#39;m expecting insertion if much more important in the any usage I might make of the T-Tree implementation).  The internal node that was too small may become larger due to a later insertion.  But my internal consistency checking code is now weakened because there is a condition on internal nodes that is no longer always true.&lt;/li&gt;

&lt;li&gt;Treat a half-leaf more like an internal node with respect to deletion and pull up an entry from its leaf to keep the half-leaf at the minimum size. If the half-leaf has a left-leaf, this is the same as the rule for deletion in an internal node.  If the half-leaf has a right-leaf, the lowest element of the leaf is required, which is a shift down of all the other elements in the leaf so is less than ideal.&lt;/li&gt;

&lt;li&gt;Have special code in the rotation operations to move elements around when rotating a half-leaf into an internal node position.  This is like possibility 2, except delaying it until it is know it will cause an invalid internal node.  T-Trees already have a special case rotation on insertion anyway.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I choose the second way for now - fix up during deletion - because the checking code can now check half-leaves as well as internal nodes for size constraints, so catching problems earlier in some insert/delete sequence.

&lt;/p&gt;

&lt;p&gt;A search of the web does not find any mention of this - most web pages are either a copy of the wikipedia page or reference the original paper.

&lt;/p&gt;

&lt;p&gt;If anyone can help me out with this, then please leave a comment or get in touch.  I&amp;#39;d be surprised if it isn&amp;#39;t that I&amp;#39;m missing something obvious (T-Trees are not new) but the situation of a small half-leaf becoming an internal node does occur as I found from the randomized testing.

&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-2115284086667883569?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/2115284086667883569/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=2115284086667883569' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/2115284086667883569'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/2115284086667883569'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2008/12/small-mystery-about-deletion-in-t-trees.html' title='A small mystery about deletion in T-Trees'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-621648220385793980</id><published>2008-10-29T20:29:00.004Z</published><updated>2008-10-29T20:33:25.619Z</updated><title type='text'>Walking the Web</title><content type='html'>It's nice to see Freebase providing an RDF interface:&amp;nbsp; &lt;a href="http://rdf.freebase.com/"&gt;http://rdf.freebase.com/. &lt;/a&gt;
The example they give is &amp;lt;&lt;a href="http://rdf.freebase.com/ns/en.blade_runner"&gt;http://rdf.freebase.com/ns/en.blade_runner&lt;/a&gt;&amp;gt; 
so let's see what is actually there and how we might use the information. 
&lt;p&gt;
Each graph describing something contains Freebase URLs to be explored.&amp;nbsp; What we want is the ability to load data into our local store while 
some query 
is running, enabling the dataset to be enlarged as the query makes choices about 
how to proceed.&lt;/p&gt;
&lt;p&gt;
This is similar to &lt;a href="http://www.w3.org/2000/10/swap/doc/Processing"&gt;cwm&lt;/a&gt;'s log:semantics.
&lt;a href="http://www.w3.org/2000/10/swap/doc/Reach"&gt;
http://ww.w3.org/2000/10/swap/doc/Reach&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In &lt;a href="http://www.w3.org/TR/rdf-sparql-query/"&gt;SPARQL&lt;/a&gt;, the dataset 
is fixed. No good if you want to write a graph-walking process without some glue 
in your favourite programming language. In one way, it's scripting for the web 
but in a special way.&amp;nbsp; It's not a sequence of queries and updates; it's changing the collection of graphs, expanding the
&lt;a href="http://www.w3.org/TR/rdf-sparql-query/#rdfDataset"&gt;RDF dataset&lt;/a&gt; 
known to the application.&lt;/p&gt;
&lt;p&gt;Query 1 : See what's in the graph&lt;/p&gt;
&lt;p&gt;Let's first look at what's available at the example URL.&amp;nbsp; That does not 
require anything special: it's just a FROM clause (which in ARQ will 
content-negotiate for RDF; if you use a web browser you will see an HTML page):&lt;/p&gt;
&lt;pre class="box"&gt;PREFIX fb: &amp;lt;http://rdf.freebase.com/ns/&amp;gt;
SELECT *
FROM fb:en.blade_runner
{ ?s ?p ?o }&lt;/pre&gt;
&lt;p&gt;Hmm - 294 triples.&lt;/p&gt;
&lt;p&gt;Query 2 : Look for interesting properties&lt;/p&gt;
&lt;pre class="box"&gt;PREFIX fb: &amp;lt;http://rdf.freebase.com/ns/&amp;gt;
SELECT DISTINCT ?p
FROM fb:en.blade_runner
{
  ?s ?p ?o
}&lt;/pre&gt;
&lt;p&gt;62 distinct properties used.&amp;nbsp; fb:film.film.starring looks interesting.&lt;/p&gt;
&lt;p&gt;Query 3 : Follow the links&lt;/p&gt;
&lt;p&gt;As an experimental feature, consider a new SPARQL keyword &amp;quot;&lt;code&gt;FETCH&lt;/code&gt;&amp;quot; which takes a URL, or 
a variable bound to a URL by the time that part of the query is reached, and 
fetches the graph at that location.&lt;/p&gt;
&lt;p&gt;Now we fetch the documents at each of the URLs that are objects of the &lt;i&gt;
blade runner, film.film.starring&lt;/i&gt; triples.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;FETCH&lt;/code&gt; loads the graph and places it in the dataset as a named graph, the name 
being the URL is fetched it from. We use &lt;code&gt;GRAPH&lt;/code&gt; to access the loaded graph. Done 
this way, triples from different sources are kept separately which might be 
important in deciding what sources to believe.&lt;/p&gt;
&lt;p&gt;This also shows a critical limitation: just placing in a named graph is a 
basic requirement for deciding what to believe but really there ought to be a 
lot more metadata about the graph, including when it was read, possibly why it 
was read (how we got here in the query) etc etc. But we are not an agent system 
so we will note this and move on.&lt;/p&gt;
&lt;p&gt;By poking around with &lt;code&gt;GRAPH ?personUUID { ?s ?p ?o}&lt;/code&gt; (60 triples) the property film.performance.actor looks hopeful.&lt;/p&gt;
&lt;pre class="box"&gt;PREFIX fb: &amp;lt;http://rdf.freebase.com/ns/&amp;gt;
SELECT ?actor
FROM fb:en.blade_runner
{
  fb:en.blade_runner fb:film.film.starring ?personUUID
  FETCH ?personUUID
  GRAPH ?personUUID
    { ?personUUID fb:film.performance.actor ?actor }
}&lt;/pre&gt;
&lt;p&gt;12 results.&lt;/p&gt;
&lt;pre class="box"&gt;--------------------------------------------
| actor                                    |
============================================
| fb:en.james_hong                         |
| fb:en.brion_james                        |
| fb:en.edward_james_olmos                 |
| fb:en.joanna_cassidy                     |
| fb:en.william_sanderson                  |
| fb:en.rutger_hauer                       |
| fb:authority.netflix.role.20000077       |
| fb:guid.9202a8c04000641f80000000054cbccc |
| fb:en.sean_young                         |
| fb:en.joe_turkel                         |
| fb:en.harrison_ford                      |
| fb:en.daryl_hannah                       |
--------------------------------------------&lt;/pre&gt;
&lt;p&gt;and more URLs to follow.&lt;/p&gt;
&lt;p&gt;Looking in the next graph, there is &lt;code&gt;fb:type.object.name&lt;/code&gt; so let's 
guess and use that.&amp;nbsp; But each time we have chosen a property, we didn't 
have to guess, we can follow that property URL itself:&lt;/p&gt;
&lt;pre class="box"&gt;PREFIX fb: &amp;lt;http://rdf.freebase.com/ns/&amp;gt;
SELECT *
FROM &lt;b&gt;fb:type.object.name&lt;/b&gt;
{
  ?s ?p ?o
}&lt;/pre&gt;
&lt;p&gt;but it's easier to &lt;a href="http://rdf.freebase.com/ns/type.object.name"&gt;read 
the description in HTML&lt;/a&gt; (and freebase is link following internally to build 
the page).&lt;/p&gt;
&lt;p&gt;Query 3 : The names of actors in &lt;i&gt;Blade Runner&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;So a query to find the names of actors in &amp;quot;Blade Runner&amp;quot; is:&lt;/p&gt;
&lt;pre class="box"&gt;PREFIX fb: &amp;lt;http://rdf.freebase.com/ns/&amp;gt;
SELECT ?actor ?name
FROM fb:en.blade_runner
{
  fb:en.blade_runner fb:film.film.starring ?personUUID
  FETCH ?personUUID
  GRAPH ?personUUID
    { ?personUUID fb:film.performance.actor ?actor }
  FETCH ?actor
  GRAPH ?actor
    { ?actor fb:type.object.name ?name }
}
ORDER BY ?actor&lt;/pre&gt;
&lt;p&gt;which gives:&lt;/p&gt;
&lt;pre class="box"&gt;
-------------------------------------------------------------------
| actor                                    | name                 |
===================================================================
| fb:authority.netflix.role.20000077       | "M. Emmet Walsh"     |
| fb:authority.netflix.role.20000077       | "M&amp;#12539;&amp;#12456;&amp;#12513;&amp;#12483;&amp;#12488;&amp;#12539;&amp;#12454;&amp;#12457;&amp;#12523;&amp;#12471;&amp;#12517;"    |
| fb:en.brion_james                        | "Brion James"        |
| fb:en.daryl_hannah                       | "Daryl Hannah"       |
| fb:en.daryl_hannah                       | "&amp;#1061;&amp;#1072;&amp;#1085;&amp;#1085;&amp;#1072;, &amp;#1044;&amp;#1101;&amp;#1088;&amp;#1080;&amp;#1083;"       |
| fb:en.daryl_hannah                       | "&amp;#1491;&amp;#1512;&amp;#1497;&amp;#1500; &amp;#1492;&amp;#1488;&amp;#1504;&amp;#1492;"          |
| fb:en.daryl_hannah                       | "&amp;#12480;&amp;#12522;&amp;#12523;&amp;#12539;&amp;#12495;&amp;#12531;&amp;#12490;"         |
| fb:en.edward_james_olmos                 | "Edward James Olmos" |
| fb:en.harrison_ford                      | "Harrison Ford"      |
| fb:en.harrison_ford                      | "&amp;#1060;&amp;#1086;&amp;#1088;&amp;#1076; &amp;#1043;&amp;#1072;&amp;#1088;&amp;#1088;&amp;#1110;&amp;#1089;&amp;#1086;&amp;#1085;"      |
| fb:en.harrison_ford                      | "&amp;#1060;&amp;#1086;&amp;#1088;&amp;#1076;, &amp;#1061;&amp;#1072;&amp;#1088;&amp;#1088;&amp;#1080;&amp;#1089;&amp;#1086;&amp;#1085;"     |
| fb:en.harrison_ford                      | "&amp;#1061;&amp;#1072;&amp;#1088;&amp;#1080;&amp;#1089;&amp;#1086;&amp;#1085; &amp;#1060;&amp;#1086;&amp;#1088;&amp;#1076;"       |
| fb:en.harrison_ford                      | "&amp;#1061;&amp;#1072;&amp;#1088;&amp;#1080;&amp;#1089;&amp;#1098;&amp;#1085; &amp;#1060;&amp;#1086;&amp;#1088;&amp;#1076;"       |
| fb:en.harrison_ford                      | "&amp;#1492;&amp;#1488;&amp;#1512;&amp;#1497;&amp;#1505;&amp;#1493;&amp;#1503; &amp;#1508;&amp;#1493;&amp;#1512;&amp;#1491;"       |
| fb:en.harrison_ford                      | "&amp;#12495;&amp;#12522;&amp;#12477;&amp;#12531;&amp;#12539;&amp;#12501;&amp;#12457;&amp;#12540;&amp;#12489;"       |
| fb:en.harrison_ford                      | "&amp;#21704;&amp;#37324;&amp;#26862;·&amp;#31119;&amp;#29305;"         |
| fb:en.harrison_ford                      | "&amp;#54644;&amp;#47532;&amp;#49832; &amp;#54252;&amp;#46300;"          |
| fb:en.james_hong                         | "James Hong"         |
| fb:en.joanna_cassidy                     | "Joanna Cassidy"     |
| fb:en.joe_turkel                         | "Joe Turkel"         |
| fb:en.rutger_hauer                       | "Rutger Hauer"       |
| fb:en.rutger_hauer                       | "&amp;#1061;&amp;#1072;&amp;#1091;&amp;#1101;&amp;#1088;, &amp;#1056;&amp;#1091;&amp;#1090;&amp;#1075;&amp;#1077;&amp;#1088;"      |
| fb:en.rutger_hauer                       | "&amp;#12523;&amp;#12488;&amp;#12460;&amp;#12540;&amp;#12539;&amp;#12495;&amp;#12454;&amp;#12450;&amp;#12540;"      |
| fb:en.rutger_hauer                       | "&amp;#39791;&amp;#26684;·&amp;#35946;&amp;#29246;"           |
| fb:en.sean_young                         | "Sean Young"         |
| fb:en.sean_young                         | "&amp;#1064;&amp;#1086;&amp;#1085; &amp;#1049;&amp;#1098;&amp;#1085;&amp;#1075;"           |
| fb:en.sean_young                         | "&amp;#1071;&amp;#1085;&amp;#1075;, &amp;#1064;&amp;#1086;&amp;#1085;"           |
| fb:en.sean_young                         | "&amp;#12471;&amp;#12519;&amp;#12540;&amp;#12531;&amp;#12539;&amp;#12516;&amp;#12531;&amp;#12464;"        |
| fb:en.william_sanderson                  | "William Sanderson"  |
| fb:guid.9202a8c04000641f80000000054cbccc | "Morgan Paull"       |
-------------------------------------------------------------------&lt;/pre&gt;

&lt;p&gt;
&amp;nbsp;&lt;/p&gt;
&lt;p&gt;
We are left with a question: why use (extended) SPARQL? If you're doing it once, 
then a web browser is easier. After all, I used one to choose the properties to 
follow.&lt;/p&gt;
&lt;p&gt;
But with a query you can send it to someone else for them to reuse your 
knowledge, you can rerun it to look for changes, you can generalise and let the 
computer do some brute force search to find things that would take you, the 
human, a long time.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-621648220385793980?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/621648220385793980/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=621648220385793980' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/621648220385793980'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/621648220385793980'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2008/10/walking-web.html' title='Walking the Web'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-3726792826047677220</id><published>2008-07-12T20:00:00.003+01:00</published><updated>2008-07-12T20:04:03.888+01:00</updated><title type='text'>ARQ Property Paths</title><content type='html'>&lt;p&gt;SPARQL basic graph patterns only allow fixed length routes through the graph 
being matched. Sometimes, the application wants a more general path so ARQ has 
acquired syntax and built-in evaluation for a path language as part of the ARQ's 
extensions to SPARQL. The path language is like string regular expressions, 
except it's over predicates, not string characters.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://jena.sourceforge.net/ARQ/property_paths.html"&gt;Property path 
documentation&lt;/a&gt; for ARQ.&lt;/p&gt;
&lt;h3&gt;Simple Paths&lt;/h3&gt;
&lt;p&gt;The first operator for simple paths is &amp;quot;/&amp;quot;, which is path concatenation, or 
following property links between nodes, the other simple path operator is &amp;quot;^&amp;quot; 
which is like &amp;quot;/&amp;quot; except the graph connection is traversed (it's the inverse 
property).&lt;/p&gt;
&lt;pre class="box"&gt;# Find the names of people 2 &amp;quot;&lt;code&gt;foaf:knows&lt;/code&gt;&amp;quot; links away.
PREFIX &amp;lt;http://xmlns.com/foaf/0.1/&amp;gt;
SELECT ?name
{ ?x foaf:mbox &amp;lt;mailto:alice@example&amp;gt; .
  ?x foaf:knows/foaf:knows/foaf:name ?name .
}&lt;/pre&gt;
&lt;p&gt;This is the same as the strict SPARQL query:&lt;/p&gt;
&lt;pre class="box"&gt;{
  ?x  foaf:mbox &amp;lt;mailto:alice@example&amp;gt; .
  ?x  foaf:knows [ foaf:knows [ foaf:name ?name ]]. 
}&lt;/pre&gt;
&lt;p&gt;or, with explicit variables:&lt;/p&gt;
&lt;pre class="box"&gt;{
  ?x  foaf:mbox &amp;lt;mailto:alice@example&amp;gt; .
  ?x  foaf:knows ?a1 .
  ?a1 foaf:knows ?a2 .
  ?a2 foaf:name ?name .
}&lt;/pre&gt;
&lt;p&gt;And these two are the same:&lt;/p&gt;
&lt;pre class="box"&gt; ?x foaf:knows/foaf:knows/foaf:name ?name . &lt;/pre&gt;
&lt;pre class="box"&gt; ?name ^foaf:name^foaf:knows^foaf:knows ?x .&lt;/pre&gt;
&lt;h3&gt;Complex Paths&lt;/h3&gt;
&lt;p&gt;The simple paths don't change the expressivity; they are a shorthand for part 
of a basic graph pattern and ARQ compiles simple paths by generating the 
equivalent basic graph patterns then merging adjacent ones together.&lt;/p&gt;
&lt;p&gt;Alternation, the &amp;quot;|&amp;quot; operator does not change the expressivity either - the 
same thing could be done with a SPARQL UNION.&lt;/p&gt;
&lt;pre class="box"&gt;# Use with Dublin core 1.0 or Dublin Core 1.1 &amp;quot;title&amp;quot;
 :book (dc10:title|dc11:title) ?title&lt;/pre&gt;
&lt;p&gt;Some complex paths do change the expressivity of language; the query can 
match things that can't be matched in a strictly fixed length paths because they 
allow arbitrary length paths through the use of &amp;quot;*&amp;quot; (zero or more), &amp;quot;+&amp;quot; (one or 
more), &amp;quot;?&amp;quot; (zero or one) as well as the form &amp;quot;{N,}&amp;quot; (N or more).&lt;/p&gt;
&lt;p&gt;Two very useful cases are:&lt;/p&gt;
&lt;pre class="box"&gt; # All the types, chasing the subclass hierarchy
 &amp;lt;http://example/&amp;gt; rdf:type/rdfs:subClassOf* ?type&lt;/pre&gt;
&lt;p&gt;and:&lt;/p&gt;
&lt;pre class="box"&gt; # Members of a list
 ?someList rdf:rest*/rdf:first ?member .&lt;/pre&gt;
&lt;p&gt;because &amp;quot;*&amp;quot; includes the case of a zero length path - all nodes are 
&amp;quot;connected&amp;quot; to themselves by a zero-length path.&lt;/p&gt;
&lt;h3&gt;Strict SPARQL&lt;/h3&gt;
&lt;p&gt;The &lt;a href="http://jena.sourceforge.net/ARQ/property_paths.html"&gt;Property 
path documentation&lt;/a&gt; shows how to install paths and name them with a URI so 
you can use a path in strict SPARQL syntax.&lt;/p&gt;
&lt;h3&gt;Other&lt;/h3&gt;
&lt;p&gt;There have been some other path-related extensions to SPARQL:&lt;/p&gt;
&lt;ul&gt;
 &lt;li&gt;&lt;a href="http://sig.biostr.washington.edu/projects/ontviews/gleen/index.html"&gt;
 GLEEN&lt;/a&gt; is a library that provides path-functionality in graph matching 
 via property functions.&amp;nbsp; It also provides subgraph extraction based on 
 pattern.&lt;/li&gt;
 &lt;li&gt;&lt;a href="http://psparql.inrialpes.fr/"&gt;PSPARQL&lt;/a&gt; allows variables in 
 paths&lt;/li&gt;
 &lt;li&gt;&lt;a href="http://www.eswc2007.org/pdf/eswc07-kochut.pdf"&gt;SPARQLeR&lt;/a&gt; 
 which has path value type&lt;/li&gt;
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-3726792826047677220?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://jena.sourceforge.net/ARQ/property_paths.html' title='ARQ Property Paths'/><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/3726792826047677220/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=3726792826047677220' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/3726792826047677220'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/3726792826047677220'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2008/07/arq-property-paths.html' title='ARQ Property Paths'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-8678944746641761320</id><published>2008-06-08T20:59:00.005+01:00</published><updated>2008-06-08T21:04:13.322+01:00</updated><title type='text'>TDB: Loading UniProt</title><content type='html'>&lt;p&gt;&lt;a href="http://jena.sf.net/TDB/"&gt;TDB&lt;/a&gt; passed a milestone this week - a 
load of the complete &lt;a href="http://dev.isb-sib.ch/projects/uniprot-rdf/"&gt;UniProt&lt;/a&gt; V13.3 dataset.&lt;/p&gt;

&lt;p&gt;UniProt V13.3 is 1,755,773,303 triples (1.7 billion) of which 1,516,036,125 
are unique after duplicate suppression.&lt;/p&gt;

&lt;p&gt;This dataset is interesting in a 
variety of ways. Firstly, it's quite large. Secondly, it is the composite of a 
small number of different, related databases and has some large literals (complete protein 
sequences - some over 70k characters in a single literal) as well as the full text of many abstracts. 
(&lt;a href="http://swat.cse.lehigh.edu/projects/lubm/"&gt;LUBM&lt;/a&gt; doesn't have 
literals at all. Testing using both synthetic data and real-world data is 
necessary.)&lt;/p&gt;

&lt;p&gt;UniProt comes as a number of RDF/XML files.&amp;nbsp; These had already been 
checked before the loading, by parsing to give
&lt;a href="http://www.w3.org/2001/sw/RDFCore/ntriples/"&gt;N-Triples&lt;/a&gt;, using 
it as a sort of dump format. The Jena RDF/XML parser does extensive checking, and the 
data had some bad URIs. I find that most large datasets do throw up some 
warnings on URIs.&lt;/p&gt;

&lt;p&gt;TDB also does value-based storage for
&lt;a href="http://www.w3.org/TR/xmlschema-2/"&gt;XSD datatypes&lt;/a&gt; decimals. integer, 
dates, and dateTimes. Except there aren't very many. For example, there 
are just 18 occurrences of the value &amp;quot;1&amp;quot;, in any form, in the entire dataset. 
They are just xsd:ints in some cardinality constraints. I was a bit 
surprised by this. Given the size of the dataset, I expected none or lots 
of uses of the value 1, so I grepped the input data to check - it's much, much quicker 
to use SPARQL than run grep on 1.7 billion triples in gzip'ed files but working 
with N-triples makes it easy to produce small tools you can be sure that work. And indeed they are the only 1 values.  Trust the SPARQL query next time.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-8678944746641761320?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://jena.sf.net/TDB' title='TDB: Loading UniProt'/><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/8678944746641761320/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=8678944746641761320' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/8678944746641761320'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/8678944746641761320'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2008/06/tdb-loading-uniprot.html' title='TDB: Loading UniProt'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-6549625354185506284</id><published>2008-03-25T18:47:00.002Z</published><updated>2008-03-25T18:51:57.734Z</updated><title type='text'>Two more ARQ extensions</title><content type='html'>&lt;p&gt;I've implemented two new extensions for ARQ:&lt;/p&gt;
&lt;ul&gt;
 &lt;li&gt;Assignment&lt;/li&gt;
 &lt;li&gt;Sub-queries&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both these expose facilities that are already in the query algebra.&amp;nbsp; 
Sub-queries are done by simply allowing query algebra operators to appear 
anywhere in the query, not requiring solution modifiers to only be at the 
outer level of the query, so it allows extensions like counting, to 
be inside the query and available to the rest of the pattern matching. An 
assigment operator existed as an algebra extension for optimization and to 
support ARQ &lt;code&gt;&lt;a href="http://jena.sourceforge.net/ARQ/select_expr.html"&gt;SELECT&lt;/a&gt;&lt;/code&gt;&lt;a href="http://jena.sourceforge.net/ARQ/select_expr.html"&gt; 
expressions&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Both are syntactic extensions and 
are available if the query is parsed 
with language &lt;code&gt;Syntax.syntaxARQ&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Currently available in
&lt;a href="http://jena.sourceforge.net/ARQ/download.html"&gt;ARQ SVN&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;Assignment&lt;/h3&gt;
&lt;p&gt;This assigns a computed value to a variable in the middle of a pattern. &lt;/p&gt;
&lt;pre&gt;LET (?x := ?y + 5 )&lt;/pre&gt;
&lt;p&gt;The assignment operator is &amp;quot;&lt;code&gt;:=&lt;/code&gt;&amp;quot;. A single &amp;quot;&lt;code&gt;=&lt;/code&gt;&amp;quot; is 
already the test for equals in SPARQL.&lt;/p&gt;
&lt;p&gt;This means that a computed value can be used in other pattern matching:&lt;/p&gt;
&lt;pre class="box"&gt; SELECT ?y ?area
 {
    ?x rdf:type :Rectangle ;
       :height ?h ;
       :width ?w .
    LET (?area := ?h*?w )
    GRAPH &amp;lt;otherShapes&amp;gt;
    {
      ?y :area ?area . # Shapes with the same area
    }
 }&lt;/pre&gt;
&lt;p&gt;Application writer can provide their own functions, maybe to do a little data 
munging to map between different formats:&lt;/p&gt;
&lt;pre class="box"&gt;   ?x  foaf:name  ?name .          # &amp;quot;John Smith&amp;quot;
   # Convert to a different style: &amp;quot;Smith, John&amp;quot; for example.
   LET (?vcardName := my:convertName(?name) )
   ?y vCard:FN ?vcardName .&lt;/pre&gt;
&lt;p&gt;There are some rules for the assignment:&lt;/p&gt;
&lt;ul&gt;
 &lt;li&gt;if the expression does not evaluate (e.g. unbound variable in the 
 expression), no assignment occurs and the query continues.&lt;/li&gt;
 &lt;li&gt;if the variable is unbound, and the expression evaluates, the variable 
 is bound to the value.&lt;/li&gt;
 &lt;li&gt;if the variable is bound to the same value as the expression evaluates, 
 nothing happens and the query continues.&lt;/li&gt;
 &lt;li&gt;if the variable is bound to a different value as the expression 
 evaluates, an error occurs and the current solution will be excluded from 
 the results.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ARQ already has expressions in
&lt;a href="http://jena.sourceforge.net/ARQ/select_expr.html"&gt;&lt;code&gt;SELECT&lt;/code&gt; 
expressions&lt;/a&gt; so a combination of sub-query and expression can achieve the 
same effect but it's unnatural and verbose and sometimes requires parts of the 
pattern matching to be written twice, inside and outside the sub-query.&lt;/p&gt;
&lt;p&gt;One place where LET might be useful is in a &lt;code&gt;CONSTRUCT&lt;/code&gt; query. In 
strict SPARQL, only terms found in the original data can be used for variables 
in the construct template but with LET-assignment:&lt;/p&gt;
&lt;pre class="box"&gt;   CONSTRUCT { ?x :lengthInInches ?inch }
   WHERE
   { ?x :lengthInCM ?cm
     LET (?inch := ?cm/2.54 )
   }&lt;/pre&gt;
&lt;p&gt;This isn't a new idea - see for example:
&lt;a href="http://www.uni-koblenz.de/%7Esschenk/publications/2007/KI2007SparqlSemantics.pdf" class="external text" title="http://www.uni-koblenz.de/~sschenk/publications/2007/KI2007SparqlSemantics.pdf" rel="nofollow"&gt;"A SPARQL Semantics based on Datalog"&lt;/a&gt; 
- although the syntax in ARQ is designed to group the terms better.&lt;/p&gt;
&lt;h3&gt;Sub-queries&lt;/h3&gt;
&lt;p&gt;A sub-query can be used to apply some solution modifier to a sub-pattern.&amp;nbsp; 
Useful examples include aggregation, especially
&lt;a href="http://jena.sourceforge.net/ARQ/group-by.html"&gt;grouping and counting&lt;/a&gt;, and &lt;code&gt;LIMIT&lt;/code&gt; 
with &lt;code&gt;ORDER BY&lt;/code&gt; to get only some of the results of a pattern match.&lt;/p&gt;
&lt;pre class="box"&gt; { SELECT (COUNT(*) AS ?c) { ?s ?p ?o } }&lt;/pre&gt;
&lt;p&gt;A sub-query is enclosed by &lt;code&gt;{}&lt;/code&gt; and must be the only thing inside 
those braces, the same style as
&lt;a href="http://www.openlinksw.com/weblog/oerling/?id=1296"&gt;Virtuoso Subqueries&lt;/a&gt;. 
The sub-query will be combined, with
&lt;a href="http://www.w3.org/TR/rdf-sparql-query/#defn_algJoin"&gt;SPARQL join&lt;/a&gt;, 
with other patterns in the same group. In the example &lt;/p&gt;
&lt;p&gt;Find how many people all persons with two or more phones foaf:knows:&lt;/p&gt;
&lt;pre class="box"&gt; PREFIX foaf: &amp;lt;&lt;a href="http://xmlns.com/foaf/0.1/"&gt;http://xmlns.com/foaf/0.1/&lt;/a&gt;&amp;gt;

 SELECT ?person ?knowsCount
 {
   # ?person who have 2 or more phones
   { SELECT ?person
     WHERE { ?person foaf:phone ?phone } 
     GROUP BY ?person 
     HAVING (COUNT(?phone) &amp;gt;= 2) 
   }
   # Join on ?person with how many people they foaf:knows
   { SELECT ?person (COUNT(?x) AS ?knowsCount)
     WHERE { ?person foaf:knows ?x .}
     GROUP BY ?person
   }
}&lt;/pre&gt;
&lt;p&gt;Queries with sub-queries can become complicated quite quickly so I usually 
write each of the part separately then combining them. &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-6549625354185506284?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/6549625354185506284/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=6549625354185506284' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/6549625354185506284'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/6549625354185506284'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2008/03/two-more-arq-extensions.html' title='Two more ARQ extensions'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-847156145423014824</id><published>2008-02-19T12:13:00.003Z</published><updated>2008-02-19T12:21:11.176Z</updated><title type='text'>First time out for TDB (pt 2)</title><content type='html'>Follow-on from &lt;a href="http://seaborne.blogspot.com/2008/02/first-time-out-for-tdb.html"&gt;previous testing&lt;/a&gt;: a larger load of 100 million triples from &lt;a href="http://dev.isb-sib.ch/projects/uniprot-rdf/"&gt;UniProt&lt;/a&gt; 7.0a performed as follows on the SDB1 machine:
&lt;ul&gt;&lt;li&gt;115,517,840 triples&lt;/li&gt;&lt;li&gt;3903s (which is 29594 triples/s) -- or about 65 minutes&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-847156145423014824?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/847156145423014824/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=847156145423014824' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/847156145423014824'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/847156145423014824'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2008/02/first-time-out-for-tdb-pt-2.html' title='First time out for TDB (pt 2)'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-5202510086940563523</id><published>2008-02-13T10:53:00.006Z</published><updated>2008-02-13T15:53:38.256Z</updated><title type='text'>First time out for TDB</title><content type='html'>&lt;p&gt;In other work, I've needed with a storage and indexing components.&amp;nbsp; To 
test them out, I've built a persistent &lt;a href="http://jena.sf.net"&gt;
Jena&lt;/a&gt; graph that behaves like an &amp;quot;RDF VM&amp;quot; whereby an application 
can handle more triples than memory alone, and it flexes to use and release its 
cache space based on the other applications running on the machine.&amp;nbsp; 
Working name: TDB. Early days, having only just finished writing the 
core code, but the core is now working to the point where it can load and query 
reliably.&lt;/p&gt;
&lt;p&gt;The RDF VM uses indexing code (currently, classical 
&lt;a href="http://en.wikipedia.org/wiki/B-tree"&gt;B-Trees&lt;/a&gt;) but in a way 
that matches the model of implementation of RDF.&amp;nbsp; There is no translation 
between the indexing and the disk idea of data. To check that made sense, I 
also tried with the B-Trees replaced by
&lt;a href="http://www.oracle.com/database/berkeley-db/je/index.html"&gt;Berkeley DB 
Java Edition&lt;/a&gt;. The BDB version behaves similarly with a constant slowdown. Of 
course, BDB-JE is more sophisticated with variable sized data items and 
duplicates (and transactions but I wasn't using them) so some overhead isn't 
surprising.&lt;/p&gt;
&lt;p&gt;I have also tried some other indexing structures but B-Trees have proved to 
scale better, from situations where there isn't much free memory to 64-bit 
machines where there is. &lt;/p&gt;
&lt;h3&gt;Node Loads&lt;/h3&gt;
&lt;p&gt;The main area of difference between the custom and BDB-backed implementations is in loading 
speed.&amp;nbsp; They handle RDF node representations differently. 
Storing them in a BDB database, or &lt;a href="http://jdbm.sf.net."&gt;JDBM&lt;/a&gt; htable, 
was adequate, giving a load rate of around 12K triples/s but it does generate too many disk 
writes to disk in an asynchronous pattern. Changing to streaming writes in TDB 
fixed that.&amp;nbsp; Because all the implementations fit the same framework, this technique 
can be rolled back into the BDB-implemented code. And BDB supports 
transactions.&amp;nbsp; The node technique may also help with a SQL database backed 
system like &lt;a href="http://jena.sourceforge.net/SDB/"&gt;SDB&lt;/a&gt; as well.&lt;/p&gt;
&lt;p&gt;I did try &lt;a href="http://lucene.apache.org/"&gt;Lucene&lt;/a&gt; - not a good idea.&amp;nbsp; 
Loading is too slow, but 
then that's not what Lucene is designed for.&lt;/p&gt;
&lt;h3&gt;Testing&lt;/h3&gt;
&lt;p&gt;For testing, I used the Jena test suite for functional tests and the &lt;a href="http://www4.wiwiss.fu-berlin.de/benchmarks-200801/"&gt;RDF Store Benchmarks with DBpedia&lt;/a&gt; 
dataset for performance.&lt;p&gt;TDB works and gives the right results for the queries. 
(It would be 
good to have the results published as well as described in the
&lt;a href="http://www.w3.org/2001/sw/DataAccess/tests/r2"&gt;DAWG test suite&lt;/a&gt; so 
testing can be done.)&lt;p&gt;Query 2 benefits hugely from caching.&amp;nbsp; If run 
completely cold, after a reboot, it can take up to 30s. Running cold is also a 
lot more variable on machine sdb1 because other projects use the disk array.&lt;p&gt;Still room for improvement 
though. The new index code doesn't quite pack the leaf nodes optimally yet and some more profiling may 
show up hotspots but for a first pass just getting the benchmark to run is fine.&amp;nbsp; 
Rewriting queries, as an optimizer should, lowers the execution time for queries 
3 and 5 to 0.48s and 1.46s respectively.&amp;nbsp;
&lt;p&gt;The results for query 4 show one possible hotspot.&amp;nbsp; This query churns 
nodes executing the filters but the node retrieval code does not benefit from 
co-locality of disk access.&amp;nbsp; Fortunately alternative code for the node 
table does make co-locality possible and still run almost as fast. Time to get 
out the profiler.
&lt;p&gt;To illustrate the &amp;quot;RDF VM&amp;quot; effect; when run with Eclipse, Firefox etc all 
consuming memory, then my home PC is 5-10% slower than when run without them 
hogging bytes even on a dataset as small as 16 million triples.&lt;h3&gt;First Results 
for TDB&lt;/h3&gt;
&lt;table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse; border: 1px solid black; padding: 1px" id="table2"&gt;
  &lt;tr style="font-size: 11.0pt; font-style: normal; text-decoration: none; font-family: Calibri, Verdana, sans-serif; text-align: right; vertical-align: bottom; margin: 2ex"&gt;
    &lt;th style="text-align: center; border: 1px solid black; padding: 1px; background: #DBE5F1"&gt;
    Machine&lt;/th&gt;
    &lt;th style="text-align: center; border: 1px solid black; padding: 1px; background: #DBE5F1"&gt;
    sdb1&lt;/th&gt;
    &lt;th style="text-align: center; border: 1px solid black; padding: 1px; background: #DBE5F1"&gt;
    Home PC&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr style="font-size: 11.0pt; font-style: normal; text-decoration: none; font-family: Calibri, Verdana, sans-serif; text-align: right; vertical-align: bottom; margin: 2ex"&gt;
    &lt;th style="text-align: center; border: 1px solid black; padding: 1px; background: #DBE5F1"&gt;
    Date&lt;/th&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;11/02/2008&lt;/td&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;11/02/2008&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr style="font-size: 11.0pt; font-style: normal; text-decoration: none; font-family: Calibri, Verdana, sans-serif; text-align: right; vertical-align: bottom; margin: 2ex"&gt;
    &lt;th style="text-align: center; border: 1px solid black; padding: 1px; background: #DBE5F1"&gt;
    &amp;nbsp;&lt;/th&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;&amp;nbsp;&lt;/td&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;&amp;nbsp;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr style="font-size: 11.0pt; font-style: normal; text-decoration: none; font-family: Calibri, Verdana, sans-serif; text-align: right; vertical-align: bottom; margin: 2ex"&gt;
    &lt;th style="text-align: center; border: 1px solid black; padding: 1px; background: #DBE5F1"&gt;
    Load (seconds)&lt;/th&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;686.582&lt;/td&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;726.1&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr style="font-size: 11.0pt; font-style: normal; text-decoration: none; font-family: Calibri, Verdana, sans-serif; text-align: right; vertical-align: bottom; margin: 2ex"&gt;
    &lt;th style="text-align: center; border: 1px solid black; padding: 1px; background: #DBE5F1"&gt;
    Load (triples/s)&lt;/th&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;23,478&lt;/td&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;22,961&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr style="font-size: 11.0pt; font-style: normal; text-decoration: none; font-family: Calibri, Verdana, sans-serif; text-align: right; vertical-align: bottom; margin: 2ex"&gt;
    &lt;th style="text-align: center; border: 1px solid black; padding: 1px; background: #DBE5F1"&gt;
    &amp;nbsp;&lt;/th&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;&amp;nbsp;&lt;/td&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;&amp;nbsp;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr style="font-size: 11.0pt; font-style: normal; text-decoration: none; font-family: Calibri, Verdana, sans-serif; text-align: right; vertical-align: bottom; margin: 2ex"&gt;
    &lt;th style="text-align: center; border: 1px solid black; padding: 1px; background: #DBE5F1"&gt;
    Query 1 (seconds)&lt;/th&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;0.05&lt;/td&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;0.03&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr style="font-size: 11.0pt; font-style: normal; text-decoration: none; font-family: Calibri, Verdana, sans-serif; text-align: right; vertical-align: bottom; margin: 2ex"&gt;
    &lt;th style="text-align: center; border: 1px solid black; padding: 1px; background: #DBE5F1"&gt;
    Query 2 (seconds)&lt;/th&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;1.30&lt;/td&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;0.73&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr style="font-size: 11.0pt; font-style: normal; text-decoration: none; font-family: Calibri, Verdana, sans-serif; text-align: right; vertical-align: bottom; margin: 2ex"&gt;
    &lt;th style="text-align: center; border: 1px solid black; padding: 1px; background: #DBE5F1"&gt;
    Query 3 (seconds)&lt;/th&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;9.87&lt;/td&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;9.50&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr style="font-size: 11.0pt; font-style: normal; text-decoration: none; font-family: Calibri, Verdana, sans-serif; text-align: right; vertical-align: bottom; margin: 2ex"&gt;
    &lt;th style="text-align: center; border: 1px solid black; padding: 1px; background: #DBE5F1"&gt;
    Query 4 (seconds)&lt;/th&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;30.99&lt;/td&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;35.32&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr style="font-size: 11.0pt; font-style: normal; text-decoration: none; font-family: Calibri, Verdana, sans-serif; text-align: right; vertical-align: bottom; margin: 2ex"&gt;
    &lt;th style="text-align: center; border: 1px solid black; padding: 1px; background: #DBE5F1"&gt;
    Query 5 (seconds)&lt;/th&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;29.87&lt;/td&gt;
    &lt;td style="border: 1px solid black; padding: 1px"&gt;34.24&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Breakdown of the sdb1 load:&lt;/p&gt;
&lt;table border="0" cellpadding="0" cellspacing="0" width="394" style="border:1px solid black; padding:1px; border-collapse: collapse; width: 294pt" id="table3"&gt;
  &lt;tr height="40" style="margin:2ex; height: 30.0pt; font-size:11.0pt; font-style:normal; text-decoration:none; font-family:Calibri, Verdana, sans-serif; text-align:right; vertical-align:bottom"&gt;
    &lt;th style="border:1px solid black; padding:1px; background:#DBE5F1; vertical-align: top; margin-top: 0; text-align:center"&gt;
    Loading&lt;/th&gt;
    &lt;th width="75" style="border:1px solid black; padding:1px; background:#DBE5F1; vertical-align: top; text-align:center"&gt;
    Triples&lt;/th&gt;
    &lt;th style="border:1px solid black; padding:1px; background:#DBE5F1; vertical-align: top; text-align:center"&gt;
    Load time seconds&lt;/th&gt;
    &lt;th style="border:1px solid black; padding:1px; background:#DBE5F1; vertical-align: top; text-align:center"&gt;
    Load rate Triples/s&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr height="20" style="margin:2ex; height: 15.0pt; font-size:11.0pt; font-style:normal; text-decoration:none; font-family:Calibri, Verdana, sans-serif; text-align:right; vertical-align:bottom"&gt;
    &lt;th style="border:1px solid black; padding:1px; background:#DBE5F1; vertical-align: top; text-align:center"&gt;
    Overall&lt;/th&gt;
    &lt;td align="right" style="border:1px solid black; padding:1px; vertical-align: top" width="75"&gt;
    16,120,177&lt;/td&gt;
    &lt;td style="border:1px solid black; padding:1px; vertical-align: top"&gt;
    &lt;p align="right"&gt;686.582s&lt;/td&gt;
    &lt;td align="right" style="border:1px solid black; padding:1px; vertical-align: top"&gt;
    23,478&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr height="20" style="margin:2ex; height: 15.0pt; font-size:11.0pt; font-style:normal; text-decoration:none; font-family:Calibri, Verdana, sans-serif; text-align:right; vertical-align:bottom"&gt;
    &lt;th style="border:1px solid black; padding:1px; background:#DBE5F1; vertical-align: top; text-align:center"&gt;
    infoboxes&lt;/th&gt;
    &lt;td align="right" style="border:1px solid black; padding:1px; vertical-align: top" width="75"&gt;
    15,472,624&lt;/td&gt;
    &lt;td align="right" style="border:1px solid black; padding:1px; vertical-align: top"&gt;
    651.543&lt;/td&gt;
    &lt;td align="right" style="border:1px solid black; padding:1px; vertical-align: top"&gt;
    24.084&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr height="20" style="margin:2ex; height: 15.0pt; font-size:11.0pt; font-style:normal; text-decoration:none; font-family:Calibri, Verdana, sans-serif; text-align:right; vertical-align:bottom"&gt;
    &lt;th style="border:1px solid black; padding:1px; background:#DBE5F1; vertical-align: top; text-align:center"&gt;
    geocordinates&lt;/th&gt;
    &lt;td align="right" style="border:1px solid black; padding:1px; vertical-align: top" width="75"&gt;
    447,517&lt;/td&gt;
    &lt;td align="right" style="border:1px solid black; padding:1px; vertical-align: top"&gt;
    24.084&lt;/td&gt;
    &lt;td align="right" style="border:1px solid black; padding:1px; vertical-align: top"&gt;
    18,581&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr height="20" style="margin:2ex; height: 15.0pt; font-size:11.0pt; font-style:normal; text-decoration:none; font-family:Calibri, Verdana, sans-serif; text-align:right; vertical-align:bottom"&gt;
    &lt;th style="border:1px solid black; padding:1px; background:#DBE5F1; vertical-align: top; text-align:center"&gt;
    homepages&lt;/th&gt;
    &lt;td align="right" style="border:1px solid black; padding:1px; vertical-align: top" width="75"&gt;
    200,036&lt;/td&gt;
    &lt;td align="right" style="border:1px solid black; padding:1px; vertical-align: top"&gt;
    10.955&lt;/td&gt;
    &lt;td align="right" style="border:1px solid black; padding:1px; vertical-align: top"&gt;
    18,259&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;


&lt;h3&gt;Setup&lt;/h3&gt;
&lt;p&gt;My home PC is a media centre - quad core, 3Gbyte RAM, consumer grade disks, 
running Vista and Norton Internet Security anti-virus. I 
guess it's quicker on the short queries because there is less latency to getting 
to the disk - even if the disks are slower - but falls behind when the query 
requires some crunching or a lot of data drawn from the disk.&lt;/p&gt;
&lt;p&gt;sdb1 is a machine in a blade rack in the data centre - details below.&lt;/p&gt;
&lt;p&gt;(My work's desktop machine, running WindowsXP has various Symantec antivirus, 
anti-intrusion software components and is slower for database work generally.)&lt;/p&gt;
&lt;p&gt;Disk: Data centre disk array over fiber channel.&lt;/p&gt;
&lt;pre class="box"&gt;/proc/cpuinfo (abbrev):

processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 37
model name : AMD Opteron(tm) Processor 252
stepping : 1
cpu MHz : 1804.121
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
TLB size : 1088 4K pages
address sizes : 40 bits physical, 48 bits virtual

processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 37
model name : AMD Opteron(tm) Processor 252
stepping : 1
cpu MHz : 1804.121
cache size : 1024 KB
fpu : yes
fpu_exception : yes
TLB size : 1088 4K pages
clflush size : 64
address sizes : 40 bits physical, 48 bits virtual&lt;/pre&gt;
&lt;pre class="box"&gt;/proc/meminfo (abbrev):

MemTotal: 8005276 kB
MemFree: 435836 kB
Buffers: 40772 kB
Cached: 7099840 kB
SwapCached: 0 kB
Active: 1165348 kB
Inactive: 6141392 kB
SwapTotal: 2048276 kB
SwapFree: 2048116 kB
Mapped: 202868 kB&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-5202510086940563523?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/5202510086940563523/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=5202510086940563523' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/5202510086940563523'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/5202510086940563523'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2008/02/first-time-out-for-tdb.html' title='First time out for TDB'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-2853417911497794313</id><published>2008-01-05T20:20:00.001Z</published><updated>2008-01-05T20:21:52.390Z</updated><title type='text'>Jena-Mulgara : example of implementing a Jena graph</title><content type='html'>&lt;p&gt;In Jena, 
&lt;a href="http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/graph/Graph.html"&gt;Graph&lt;/a&gt; is an interface. It abstracts anything that looks like RDF - 
storage options, inference, other legacy data sources.&lt;/p&gt;

&lt;p&gt;The main operations are &lt;code&gt;find(Triple)&lt;/code&gt;, &lt;code&gt;add(Triple)&lt;/code&gt; and
&lt;code&gt;remove(Triple)&lt;/code&gt;. In 
addition, there are a number of getters to access handlers of various features 
(query, statistics, reification, bulk update, event manager) . 
Having handlers, rather than directly including all the operations for each 
feature reduces the size of the interface and makes it easier to provide default 
implementations of each feature.&lt;/p&gt;
&lt;p&gt;Implementing a graph rarely needs to directly implement the interface.&amp;nbsp; 
More usually, an implementation starts by inheriting from the class GraphBase.&amp;nbsp; 
A minimal (read-only) implementation just needs to implement &lt;code&gt;graphBaseFind&lt;/code&gt;. 
Wrapping legacy data often only makes sense as a read-only graph. To provide update operations, just implement the methods &lt;code&gt;performAdd&lt;/code&gt; and &lt;code&gt;performDelete&lt;/code&gt;, 
which are the methods called from the base implementations of &lt;code&gt;add(Triple)&lt;/code&gt; and
&lt;code&gt;remove(Triple). &lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Then for testing with &lt;a href="http://www.junit.org/"&gt;JUnit&lt;/a&gt;, inherit 
from AbstractGraphTest (override tests that don't make sense in a particular circumstance) 
and provide the &lt;code&gt;getGraph&lt;/code&gt; operation to generate a graph instance to test. &lt;/p&gt;
&lt;h3&gt;Application APIs&lt;/h3&gt;
&lt;p&gt;Graph/Triple/Node provide the low level interface in Jena; 
Model/Statement/Resource/Literal provide the
&lt;a href="http://jena.sourceforge.net/tutorial/RDF_API/index.html"&gt;RDF API&lt;/a&gt; 
and the &lt;a href="http://jena.sourceforge.net/ontology/index.html"&gt;ontology API&lt;/a&gt; 
provides an OWL-centric view of the RDF data.&lt;/p&gt;
&lt;p&gt;Where the graph level is minimal and symmetric (e.g. literal as subjects, 
inclusion of named variables) for easy implementation, the RDF API enforces the 
RDF conditions and provides a wide variety of convenience operations so writing a 
program can be succinct, not requiring the application writer to write 
unnecessary boilerplate code sequences. The ontology API does the same for OWL.&amp;nbsp; 
If you look at the &lt;a href="http://jena.sourceforge.net/javadoc/index.html"&gt;javadoc&lt;/a&gt;, you'll see the APIs are large but the system level 
interface is small.&lt;/p&gt;
&lt;p&gt;A graph is turned into a Model by calling &lt;code&gt;
ModelFactory.createModelForGraph(Graph)&lt;/code&gt;. All the key application APIs 
are interface-based although it's rarely needed to do anything other that use the 
standard Model-Graph bridge.&lt;/p&gt;
&lt;p&gt;Data access to the graph all goes via &lt;i&gt;find&lt;/i&gt;. All the read operations of 
application APIs, directly or indirectly, come down to calling Graph.find or a 
graph query handler. And the default graph query handler works by calling 
Graph.find, so once &lt;i&gt;find&lt;/i&gt; is implemented everything (read-only) works.
&lt;a href="http://jena.sourceforge.net/ARQ/app_api.html"&gt;ARQ&lt;/a&gt;'s
&lt;a href="http://jena.sourceforge.net/ARQ/app_api.html"&gt;query API&lt;/a&gt;, 
which includes a &lt;a href="http://www.w3.org/TR/rdf-sparql-query/"&gt;SPARQL&lt;/a&gt; 
implementation, included. It may not be the most efficient way but importantly all functionality is 
available and so the graph implementer can quickly get a first implementation up 
and running, then decide where and when to spend further development time - or 
whether that's needed at all.&lt;/p&gt;
&lt;h3&gt;Jena-Mulgara&lt;/h3&gt;
&lt;p&gt;An example of this is a prototype
&lt;a href="http://jena.hpl.hp.com/wiki/JenaMulgara"&gt;Jena-Mulgara bridge&lt;/a&gt; (work 
in progress as of Jan'08). This maps the Graph API to a Mulgara session object, 
which can be a local Mulgara database or a remote Mulgara server. The prototype 
is a single class together with a set of factory operations for more convenient 
creation of a bridge graph wrapped in all Jena's APIs.&lt;/p&gt;
&lt;p&gt;Implementing graph nodes, for IRIs and for literals is straight forward.&amp;nbsp; 
Mulgara uses &lt;a href="http://jrdf.sourceforge.net/"&gt;JRDF&lt;/a&gt; to represent these 
nodes and to represent triples. Mapping to and from Jena versions of the same is 
just the change in naming.&lt;/p&gt;
&lt;p&gt;Blank nodes are more interesting. A blank node in Jena has an internal label 
(which is not a URI in disguise). When working at the lowest level of &lt;i&gt;Graph&lt;/i&gt;, 
the code is manipulating things at a concrete, syntactic level. &lt;/p&gt;
&lt;p&gt;A blank node in Mulgara has an internal id but it can change. It really is 
the internal node index as I found out by creating a blank node with id=1 and 
found it turned into rdf:type which was what was really at node slot 1.
&lt;a href="http://gearon.blogspot.com/"&gt;Paul&lt;/a&gt; has been (patiently!)
&lt;a href="http://mulgara.org/pipermail/mulgara-general/2008-January/000240.html"&gt;
explaining this&lt;/a&gt; to me on a Mulgara mailing list. The session interface is an 
interface onto the RDF data, not an interface to extend the graph details to the 
client. Both approaches are valid - it's just different levels of abstraction.&lt;/p&gt;
&lt;p&gt;If the Jena application is careful about blank nodes (not assuming they are 
stable across transactions, and not deleting all triples involving some blank 
node, then creating triples involving that blank node) then it all works out. 
The most important case of reading data within a transaction is safe. Bulk 
loading is better down via the native Mulgara interfaces anyway. The 
Jena-Mulgara bridge enables a Jena application to access a Mulgara server 
through the same interfaces as any other RDF data.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-2853417911497794313?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://jena.hpl.hp.com/wiki/JenaMulgara' title='Jena-Mulgara : example of implementing a Jena graph'/><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/2853417911497794313/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=2853417911497794313' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/2853417911497794313'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/2853417911497794313'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2008/01/jena-mulgara-example-of-implementing.html' title='Jena-Mulgara : example of implementing a Jena graph'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-4691605369637965896</id><published>2007-09-08T20:06:00.001+01:00</published><updated>2007-09-08T20:09:57.783+01:00</updated><title type='text'>Counting and GROUP BY</title><content type='html'>&lt;p&gt;One thing people miss from SPARQL is counting. It's a feature that working 
group didn't have time for.&lt;/p&gt;
&lt;p&gt;There's an implementation, following the design in SQL, in ARQ SVN which will 
be in the next release (v2.1). v2.1 introductions the
&lt;a href="http://jena.sourceforge.net/ARQ/bgp-optimization.html"&gt;cost-based optimizer&lt;/a&gt; 
for in-memory basic graph patterns by
&lt;a href="http://www.ifi.uzh.ch/ddis/people/markus-stocker/"&gt;Markus&lt;/a&gt;.  &lt;/p&gt;
&lt;p&gt;It's a syntactic extension, not strict SPARQL, so you have to tell the system 
to parse queries in the &amp;quot;ARQ&amp;quot; langauge by passing &lt;code&gt;Syntax.syntaxARQ&lt;/code&gt; 
to the query factory.&lt;/p&gt;
&lt;p&gt;The following queries will work:&lt;/p&gt;
&lt;pre class="box"&gt;SELECT count(*) { ... }&lt;/pre&gt;
&lt;pre class="box"&gt;SELECT (count(*) AS ?count) { ... }&lt;/pre&gt;
&lt;p&gt;This is based on having SELECT expressions as well as grouping. Using AS to 
give a named variable is better style because the results can go into the 
SPARQLXML results format; otherwise, an internal variable is allocated and they 
have illegal SPARQL names.&lt;/p&gt;
&lt;p&gt;Other examples:&lt;/p&gt;
&lt;pre class="box"&gt;SELECT (count(*) AS ?rows)
{ ... }
GROUP BY ?x&lt;/pre&gt;
&lt;pre class="box"&gt;SELECT count(distinct *)
{ ... }
GROUP BY ?x&lt;/pre&gt;
&lt;pre class="box"&gt;SELECT count(?y)
{ ... }
GROUP BY ?x&lt;/pre&gt;
&lt;p&gt;What is being counted is solutions, in the case of &lt;code&gt;count(*)&lt;/code&gt; and 
names, in the case of &lt;code&gt;count(?var)&lt;/code&gt;. &lt;/p&gt;
&lt;p&gt;The current list of ARQ extensions is:&lt;/p&gt;
&lt;ul&gt;
 &lt;li&gt;
 &lt;a href="http://seaborne.blogspot.com/2007/07/basic-federated-sparql-query.html"&gt;SERVICE&lt;/a&gt; - call-out from a query to a SPARQL endpoint over HTTP&lt;/li&gt;
 &lt;li&gt;SELECT expressions&lt;/li&gt;
 &lt;li&gt;GROUP BY&lt;/li&gt;
 &lt;li&gt;count()&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;So what features should be next?&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-4691605369637965896?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/4691605369637965896/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=4691605369637965896' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/4691605369637965896'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/4691605369637965896'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2007/09/counting-and-group-by.html' title='Counting and GROUP BY'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-4355336082245665237</id><published>2007-08-08T11:33:00.000+01:00</published><updated>2007-08-08T11:38:50.177+01:00</updated><title type='text'>Syntax Comparison of N3 and Turtle</title><content type='html'>&lt;p&gt;Some &lt;a href="http://jena.hpl.hp.com/wiki/Syntax_Comparison_of_N3_and_Turtle"&gt;notes on the syntax differences between N3 and Turtle&lt;/a&gt;, with a little
SPARQL thrown in.&lt;/p&gt;

&lt;p&gt;Some of the differences are low-level issues, which are enough to trip-up
machine reading of RDF data in these formats across the web.  It also
makes writing reliable RDF output writers harder because of variability of
parsers.&lt;/p&gt;

&lt;p&gt;RDF/XML really does win for web exchange but it is also a barrier to
people who find N3 or Turtle much easier to understand and produce.&lt;/p&gt;

&lt;p&gt;This collection of differences is not complete and I'll add and amend the
note. Please let know me of anything that should be added or corrected.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-4355336082245665237?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://jena.hpl.hp.com/wiki/Syntax_Comparison_of_N3_and_Turtle' title='Syntax Comparison of N3 and Turtle'/><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/4355336082245665237/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=4355336082245665237' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/4355336082245665237'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/4355336082245665237'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2007/08/syntax-comparison-of-n3-and-turtle.html' title='Syntax Comparison of N3 and Turtle'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-8438016288800315387</id><published>2007-07-27T19:06:00.000+01:00</published><updated>2007-07-27T19:34:21.350+01:00</updated><title type='text'>Basic Federated SPARQL Query</title><content type='html'>&lt;p&gt;There are already ways to access remote RDF data. The simplest is to read a 
document which is an RDF graph and query it. Another way is with the
&lt;a href="http://www.w3.org/TR/rdf-sparql-protocol/"&gt;SPARQL protocol&lt;/a&gt; 
which 
allows a query to be sent to a remote service endpoint and the results sent back 
(in RDF, or an &lt;a href="http://www.w3.org/TR/rdf-sparql-XMLres/"&gt;XML-based results 
format&lt;/a&gt; or even a &lt;a href="http://www.w3.org/TR/rdf-sparql-json-res/"&gt;JSON one&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Several people writing on
&lt;a href="http://tech.groups.yahoo.com/group/jena-dev/messages"&gt;jena-dev&lt;/a&gt; have been attempting to created federated query 
applications where part of a query needs to sent to one or more remote services.&lt;/p&gt;
&lt;p&gt;Here's a basic building block for such federated query use cases. It adds the 
ability to make a SPARQL protocol call within a query, not just send the whole 
query to the remote service.&lt;/p&gt;
&lt;h4&gt;Syntax&lt;/h4&gt;
&lt;p&gt;A new keyword &lt;code&gt;SERVICE&lt;/code&gt; is added to the extended SPARQL query 
language in &lt;a href="http://jena.sf.net/ARQ/"&gt;ARQ&lt;/a&gt;. This keyword causes the 
sub-pattern to be sent to a named SPARQL service endpoint, and not matched 
against a local graph.&lt;/p&gt;
&lt;pre class="box"&gt;PREFIX : &amp;lt;http://example/&gt;
PREFIX  dc:     &amp;lt;http://purl.org/dc/elements/1.1/&gt;

SELECT ?a
FROM &amp;lt;mybooks.rdf&amp;gt;
{
  ?b dc:title ?title .
  &lt;b&gt;SERVICE&lt;/b&gt; &amp;lt;http://sparql.org/books&gt;
     { ?s dc:title ?title . ?s dc:creator ?a }
}&lt;/pre&gt;

&lt;h4&gt;Algebra&lt;/h4&gt;
&lt;p&gt;There is a new operator in the algebra.&lt;/p&gt;
&lt;pre class="box"&gt;(prefix ((dc: &amp;lt;http://purl.org/dc/elements/1.1/&gt;))
  (project (?a)
    (join
      (BGP [triple ?b dc:title ?title])
      (&lt;b&gt;service&lt;/b&gt; &amp;lt;http://sparql.org/books&gt;
          (BGP
            [triple ?s dc:title ?title]
            [triple ?s dc:creator ?a]
          ))
      )))&lt;/pre&gt;
&lt;h4&gt;Performance Considerations&lt;/h4&gt;
&lt;p&gt;This feature is a basic building block to allow remote access in the middle 
of a query, not a general solution to the issues in distributed query 
evaluation. The algebra operation is executed without regard to how selective the pattern 
is. So the order of the query will affect the speed of execution. Because it 
involves HTTP operations, asking the query in the right order matters a lot. 
Don't ask for the whole of a bookstore just to find book whose title comes from 
a local RDF file - ask the bookshop a query with the title already bound from 
earlier in the query.&lt;/p&gt;
&lt;h4&gt;Proper SPARQL&lt;/h4&gt;
&lt;p&gt;On top of this access operation, it would be possible to build a query 
processor that does what &lt;a href="http://darq.sf.net/"&gt;DARQ &lt;/a&gt;(the &lt;a href="http://darq.sf.net/"&gt;DARQ project&lt;/a&gt; 
is not active) does which is to read SPARQL query, analyse 
it, and build a query on the extended algebra.The execution order is chosen 
based on the selectivity of the triple patterns so it minimises network traffic.&lt;p&gt;
Hopefully, given the building block in ARQ, someone will add the necessary query 
execution analysis to give a query broker that accepts strict SPARQL and uses a 
number of SPARQL services to answer the query.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-8438016288800315387?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/8438016288800315387/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=8438016288800315387' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/8438016288800315387'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/8438016288800315387'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2007/07/basic-federated-sparql-query.html' title='Basic Federated SPARQL Query'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-6251326085246474191</id><published>2007-07-25T20:43:00.000+01:00</published><updated>2007-07-25T20:47:08.400+01:00</updated><title type='text'>SSE</title><content type='html'>&lt;p&gt;Following on from
&lt;a href="http://seaborne.blogspot.com/2007/04/sparql-s-expressions.html"&gt;SPARQL 
S-Expressions&lt;/a&gt; :: &lt;a href="http://jena.hpl.hp.com/wiki/SSE"&gt;a description of SSE&lt;/a&gt;, a notation for RDF-related data structures (like the SPARQL algebra).&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-6251326085246474191?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://jena.hpl.hp.com/wiki/SSE' title='SSE'/><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/6251326085246474191/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=6251326085246474191' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/6251326085246474191'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/6251326085246474191'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2007/07/sse.html' title='SSE'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-2075516693155912005</id><published>2007-06-29T11:51:00.000+01:00</published><updated>2007-06-29T11:59:19.790+01:00</updated><title type='text'>Installing OracleXE</title><content type='html'>&lt;p&gt;We have been adding support for Oracle into &lt;a href="http://jena.sf.net/SDB"&gt;SDB&lt;/a&gt;.
As part of that, I installed Oracle XE ("Oracle Database 10g Express Edition")
on my WindowsXP machine at work, which is part of a Windows domain.&lt;/p&gt;

&lt;p&gt;It didn't go smoothly.  I hope these notes help someone else trying to do the same thing.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Summary&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Install when logged in as administrator (NOT a domain user, even if in group Administrators).&lt;/li&gt;
&lt;li&gt;In order to use the command line SQL interface, you need to change
SQLNET.ORA from &lt;code&gt;(NTS)&lt;/code&gt; to &lt;code&gt;(none)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;b&gt;Account&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;The Oracle installation instructions say to
&lt;a href="http://download-uk.oracle.com/docs/cd/B25329_01/doc/install.102/b25143/toc.htm#CIHHJEHF"&gt;
log in with administrative privileges and be attached to the domain&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The installation proceeded with no errors, and the 5 Windows services start 
up OK. But I found that I could not connect to the database web-based admin 
interface.&lt;/p&gt;

&lt;p&gt;An additional symptom was that I can't change my domain password, nor
create Windows local user accounts.  In both cases, the error was that the password
does not meet the requirements on characters and length.  HP has tighter 
guidelines than the default for passwords but why installation of OracleXE broke a part of WindowsXP, I don't understand. After uninstalling OracleXE, I could change my password and create local user accounts as usual.&lt;/p&gt;

&lt;p&gt;There are lots of articles about not being about to contact the database home 
page but only a few were related to the situation I had:
&lt;a href="http://forums.oracle.com/forums/thread.jspa?messageID=1317065&amp;#1317065"&gt;
this one was most useful&lt;/a&gt;
 (that link is into the OracleXE forum which is not 
publicly readable - you have to register first).&lt;/p&gt;

&lt;p&gt;A way to see if your installation has been affected is to see if there is a 
file &lt;code&gt;server\dbs\SPFILEXE.ORA&lt;/code&gt;. If not, the installation is 
probably broken.&lt;/p&gt;

&lt;p&gt;Logging in as the local administrator account, uninstalling, reinstalling got 
an installation where I could get to the database home page and administer the 
database. I guess any local user account in group Administarors would work.&lt;/p&gt;

&lt;p&gt;But sqlplus.exe still didn't work. (The error is 
&amp;quot;ORA-12638: Credential retrieval failed&amp;quot;).
Following that thread again, I changed SQLNET.ORA to 
&lt;code&gt;&amp;nbsp;SQLNET.AUTHENTICATION_SERVICES = (none)&amp;nbsp;&lt;/code&gt; and could 
connect to the database from the SQL command prompt.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Afterwards ...&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;Installing on Windows XP/Home worked fine but that is not a domained machine.&lt;/p&gt;
&lt;p&gt;And after all that, the &lt;a href="http://jena.sf.net/SDB"&gt;SDB&lt;/a&gt; &lt;a href="http://www.w3.org/TR/rdf-sparql-query/"&gt;SPARQL&lt;/a&gt; test suite 
runs perfectly with OracleXE.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-2075516693155912005?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/2075516693155912005/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=2075516693155912005' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/2075516693155912005'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/2075516693155912005'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2007/06/installing-oraclexe.html' title='Installing OracleXE'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-6856738701372728811</id><published>2007-06-22T13:51:00.000+01:00</published><updated>2007-06-22T14:15:10.486+01:00</updated><title type='text'>Joseki 3.1</title><content type='html'>&lt;p&gt;New version of &lt;a href="http://www.joseki.org/"&gt;Joseki&lt;/a&gt; released, primarily to
package together all the updated jars files for Jena and ARQ that Joseki uses.  At the same time, other jars have been upgraded so there are jar naming changes.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://downloads.sourceforge.net/joseki/joseki-3.1.zip"&gt;Joseki 3.1 download&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-6856738701372728811?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://downloads.sourceforge.net/joseki/joseki-3.1.zip' title='Joseki 3.1'/><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/6856738701372728811/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=6856738701372728811' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/6856738701372728811'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/6856738701372728811'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2007/06/joseki-31.html' title='Joseki 3.1'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-3873013516216914150</id><published>2007-06-10T16:22:00.001+01:00</published><updated>2007-06-10T16:22:42.886+01:00</updated><title type='text'>SPARQL Papers at ESWC 2007</title><content type='html'>&lt;p&gt;There were 3 papers that particularly caught my SPARQL-driven attention at &lt;a href="http://www.eswc2007.org/"&gt;ESWC2007&lt;/a&gt;.
&lt;/p&gt;
&lt;hr/&gt;

&lt;p&gt;
&lt;a href="http://www.eswc2007.org/pdf/eswc07-kochut.pdf"&gt;SPARQLeR: Extended Sparql for Semantic Association Discovery&lt;/a&gt;&lt;br/&gt;
Krys J. Kochut, Maciej Janik
&lt;/p&gt;
&lt;p&gt;
This describes an extension to SPARQL for path variables.  A path is a regular expression of properties but in addition the paper describes the need for reverse properties and constraints on paths (like length).
&lt;/p&gt;
&lt;p&gt;
See also: PSPARQL: &lt;a href="http://psparql.inrialpes.fr/"&gt;
http://psparql.inrialpes.fr/&lt;/a&gt;
&lt;/p&gt;
&lt;hr/&gt;
&lt;p&gt;
&lt;a href="http://www.eswc2007.org/pdf/eswc07-munoz.pdf"&gt;
Minimal Deductive Systems for RDF&lt;/a&gt;&lt;br/&gt;
Sergio Muñoz, Jorge Pérez  and Claudio Gutierrez.
&lt;/p&gt;
&lt;p&gt;
This is a proposal for reduced RDFS with just rdfs:domain, rdfs:range, rdfs:subClassOf, rdfs:subPropertyOf and rdf:type.
&lt;/p&gt;
&lt;p&gt;
This results in (on page 8) a small set of rules that have to be applied to the data but there is no core vocabulary. The rules can be applied to a streaming data stream, if the RDFS schema is known, because each rule only refers to at most one data triple.
&lt;/p&gt;
&lt;p&gt;
There are no containers, which may be inconvenient, but that might more usefully be covered by not using typing, but having a different property just to match these syntactic constructs. That removes the container vocabulary from interacting with the application vocabulary.
&lt;/p&gt;
&lt;p&gt;
A colleague here, Nipun Bhatia, has been working on streaming checking and rule application based on extending &lt;a href="http://jena.sourceforge.net/Eyeball/"&gt;Eyeball&lt;/a&gt;.  Nipun even adds cardinality validation by preprocessing the data to get
the triples in subject order.  Unix sort(1) is quite capable of sorting very large N-triples files in sensible amounts of time.
&lt;/p&gt;
&lt;p&gt;
&lt;hr/&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://www.eswc2007.org/pdf/eswc07-kiefer.pdf"&gt;
Semantic Process Retrieval with iSPARQL&lt;/a&gt;&lt;br/&gt;
Christoph Kiefer, Abraham Bernstein, Hong Joo Lee, Mark Klein and Markus Stocker.
&lt;/p&gt;
&lt;p&gt;
((Non) interest declaration: Markus is now spending a few months working with us in Bristol - this work was done before that.)
&lt;/p&gt;
&lt;p&gt;
The core of this paper is an example where statistical techniques beats logic.  There is a strong message to us all here - don't think logic and perfect organization is
necessarily the best solution to actual problems.
&lt;/p&gt;
&lt;p&gt;
As part of this work, but not the main argument of the paper, they created iSPARQL (&lt;em&gt;i&lt;/em&gt;=inprecise) which is an embedding of access to similarity metrics inside standard SPARQL without syntax changes.  They use property functions (they are using ARQ but the principle is quite general) to access the similarity engine. 
&lt;/p&gt;
&lt;p&gt;
The idea of embedding some index or other functionality that can provide bindings of variables for some expression seems like a general extension technique for SPARQL.  &lt;a href="http://jena.sourceforge.net/ARQ/lucene-arq.html"&gt;LARQ&lt;/a&gt; provides free-texting matching, using &lt;a href="http://lucene.apache.org/java/"&gt;
Lucene&lt;/a&gt; to do matching, and can include all the Lucene loose matching
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-3873013516216914150?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.eswc2007.org/' title='SPARQL Papers at ESWC 2007'/><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/3873013516216914150/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=3873013516216914150' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/3873013516216914150'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/3873013516216914150'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2007/06/sparql-papers-at-eswc-2007.html' title='SPARQL Papers at ESWC 2007'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-3090000320201971796</id><published>2007-04-09T19:45:00.001+01:00</published><updated>2007-04-09T19:52:27.048+01:00</updated><title type='text'>SPARQL S-expressions</title><content type='html'>&lt;p&gt;I'm interested in exposing the
&lt;a href="http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra"&gt;SPARQL algebra&lt;/a&gt; 
support in ARQ for others (and me!) to experiment with
&lt;a href="http://www.w3.org/TR/rdf-sparql-query/"&gt;SPARQL&lt;/a&gt; and SPARQL
extensions. &lt;/p&gt;
&lt;p&gt;Having a syntax to be able to write algebra expressions is useful and makes 
writing test cases of the algebra easier. ARQ already uses
&lt;a href="http://en.wikipedia.org/wiki/S-expressions"&gt;S-expressions&lt;/a&gt; to detail 
the syntax tree so it was natural to use S-expressions for algebra expressions.&lt;/p&gt;
&lt;p&gt;I split the lowest levels of syntax out, to avoid having to write a many parsers. 
The result - SSE (SPARQL S-Expressions), a vaguely lisp-ish syntax. It consists 
of lists, RDF terms (IRIs, blank nodes, prefixed names and literals) in SPARQL 
syntax, and also words which are plain symbols without colon.&lt;/p&gt;
&lt;p&gt;Given this universal syntax, it's a matter of building code libraries to 
build the Java data structures from SSE. This is mundane but being able to do this without rebuilding a parser each time is easier.&lt;/p&gt;

&lt;p&gt;Example query:&lt;/p&gt;
&lt;pre class="box"&gt; PREFIX : &amp;lt;http://example/&amp;gt; 

 SELECT ?x ?v
 { ?x :p ?v 
   OPTIONAL { ?v :q ?w }
 }&lt;/pre&gt;
&lt;p&gt;which is the algebra expression:&lt;/p&gt;
&lt;pre class="box"&gt; (project (?x ?v)
   (leftjoin
     (bgp [triple ?x &amp;lt;http://example/p&amp;gt; ?v])
     (bgp [triple ?v &amp;lt;http://example/q&amp;gt; ?w])))&lt;/pre&gt;
&lt;p&gt;The use of either () or [] for lists, where beginning and end must match, aids 
readability but has no other significance.&lt;/p&gt;
&lt;p&gt;Another example: 'prefix' defines namespaces for the enclosed body:&lt;/p&gt;
&lt;pre class="box"&gt; (prefix ((: &amp;lt;http://example/&amp;gt;))
   (project (?c)
     (filter (= ?c &amp;quot;world&amp;quot;)
       (bgp [triple ?s :p ?c]) )))&lt;/pre&gt;
&lt;p&gt;It doesn't just capture strict SPARQL: tables-as-constants mean an SSE file 
can contain data as well&lt;/p&gt;
&lt;pre class="box"&gt;(prefix ((x: &amp;lt;http://example/&amp;gt;))
  (join
    (table
      (row [?x 1] [?y x:g])
      (row [?x 2] ))
    (table 
      (row [?y x:g])
      (row [?x 2] ))
  ))&lt;/pre&gt;
&lt;p&gt;evaluating to:&lt;/p&gt;
&lt;pre class="box"&gt; --------------------------
 | y                  | x |
 ==========================
 | &amp;lt;http://example/g&amp;gt; | 1 |
 | &amp;lt;http://example/g&amp;gt; | 2 |
 |                    | 2 |
 --------------------------&lt;/pre&gt;
&lt;p&gt;It's still &amp;quot;work in progress&amp;quot; and a bit rough - it can be inconsistent in 
layout, mainly due to slipping in a 
quick bit of hacking between doing other things; and also this leads to 
different coding styles in different places. But it's already proved to be an 
efficient way to write SPARQL algebra expressions and evaluate them for testing.&lt;/p&gt;
&lt;p&gt;And doing an Emacs mode for SSE is trivial.&lt;/p&gt;
&lt;p&gt;As an aside - I did a little web-trawling for the lisp information and 
&lt;a href="http://seaborne.blogspot.com/2007/04/some-lisp-links.html"&gt;gathered my links together&lt;/a&gt;.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-3090000320201971796?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/3090000320201971796/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=3090000320201971796' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/3090000320201971796'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/3090000320201971796'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2007/04/sparql-s-expressions.html' title='SPARQL S-expressions'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-1421332953032439442</id><published>2007-04-09T19:38:00.000+01:00</published><updated>2007-04-11T09:22:02.028+01:00</updated><title type='text'>Some Lisp Links</title><content type='html'>&lt;p&gt;A partial, incomplete set of links to things about Lisp from a couple hours
of web wandering. It's a bit of a change to be linking to web pages from the
last millennium.&lt;/p&gt;
&lt;h4&gt;Lisp / General&lt;/h4&gt;
&lt;p&gt;The function/value namespace thing:: Scheme vs Common Lisp:
&lt;a href="http://www.nhplace.com/kent/Papers/Technical-Issues.html"&gt;http://www.nhplace.com/kent/Papers/Technical-Issues.html&lt;/a&gt;.
Some of the arguments look a bit dated by modern standards.&lt;/p&gt;
&lt;h4&gt;Scheme&lt;/h4&gt;
&lt;p&gt;&lt;a href="http://swiss.csail.mit.edu/projects/scheme/"&gt;
http://swiss.csail.mit.edu/projects/scheme/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://en.wikipedia.org/wiki/Scheme_(programming_language)"&gt;
Wikipedia - Scheme&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.r6rs.org/"&gt;http://www.r6rs.org/&lt;/a&gt; - The latest Scheme
definition. The nice thing, from a purely practical point of view, in this round
of agreement is the definition of the library system.&lt;/p&gt;
&lt;p&gt;
Online books:&lt;/p&gt;
&lt;ul&gt;
 &lt;li&gt;&lt;a href="http://mitpress.mit.edu/sicp/full-text/book/book.html"&gt;Structure and Interpretation of Computer Programs&lt;/a&gt;
 (THE book)&lt;/li&gt;
 &lt;li&gt;&lt;a href="http://www.scheme.com/tspl3/"&gt;The Scheme Programming Language&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;&lt;a href="http://www.htdp.org/2003-09-26/Book/"&gt;How to Design Programs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;CMU Scheme repository:
&lt;a href="ftp://ftp.cs.cmu.edu/user/ai/lang/scheme/0.html"&gt;
ftp://ftp.cs.cmu.edu/user/ai/lang/scheme/0.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Community: &lt;a href="http://community.schemewiki.org/"&gt;
http://community.schemewiki.org/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Object-oriented programming and Scheme&lt;/p&gt;
&lt;ul&gt;
 &lt;li&gt;&lt;a href="http://www.faqs.org/faqs/scheme-faq/part1/section-6.html"&gt;http://www.faqs.org/faqs/scheme-faq/part1/section-6.html&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;&lt;a href="http://www-spi.lip6.fr/~queinnec/WWW/Meroon.html"&gt;Meroon&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;&lt;a href="ftp://ftp.parc.xerox.com/pub/mops/tiny/"&gt;TinyCLOS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;SLIB (A portable scheme library - &amp;quot;portable&amp;quot; seems to mean &amp;quot;it can be ported&amp;quot;).
Included in SISC.&lt;a href="http://swissnet.ai.mit.edu/~jaffer/SLIB"&gt;
http://swissnet.ai.mit.edu/~jaffer/SLIB&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Websites:&lt;/p&gt;
&lt;ul&gt;
 &lt;li&gt;&lt;a href="http://www.schemers.org/"&gt;http://www.schemers.org/&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;&lt;a href="http://srfi.schemers.org/"&gt;http://srfi.schemers.org/&lt;/a&gt;&lt;br&gt;
 SRFI - Scheme Requests for Implementation&lt;/li&gt;
 &lt;li&gt;&lt;a href="http://readscheme.org/"&gt;http://readscheme.org/&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;&lt;a href="http://library.readscheme.org/"&gt;http://library.readscheme.org/&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Original lambda papers:
 &lt;a href="http://library.readscheme.org/page1.html"&gt;http://library.readscheme.org/page1.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Scheme / JVM Implementations&lt;/h4&gt;
&lt;p&gt;Access to access to &lt;a href="http://jena.sourceforge.net/ARQ/"&gt;ARQ&lt;/a&gt; for a
&lt;a href="http://www.w3.org/TR/rdf-sparql-query/"&gt;SPARQL&lt;/a&gt; engine is important.&lt;/p&gt;
&lt;ul&gt;
 &lt;li&gt;&lt;a href="http://sisc-scheme.org/"&gt;http://sisc-scheme.org/&lt;/a&gt;&amp;nbsp; (GPL 2 or
MPL 1.1)&lt;/li&gt;
 &lt;li&gt;&lt;a href="http://jscheme.sourceforge.net/"&gt;http://jscheme.sourceforge.net/&lt;/a&gt;
(Apache, zlib)&lt;/li&gt;
 &lt;li&gt;&lt;a href="http://www.gnu.org/software/kawa/"&gt;http://www.gnu.org/software/kawa/&lt;/a&gt;
(GPL 2)&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Scheme / Eclipse&lt;/h4&gt;
&lt;p&gt;One of things that Java does have going for it is a free, sophisticated IDEs.&amp;nbsp;
&lt;a href="http://www.eclipse.org/"&gt;Eclipse&lt;/a&gt; makes refactoring easy enough so
as to encourage it as the project grows. For a project like ARQ, it's near
essential to keep the naming and structure aligned to current terminology.
Writing lisp in Emacs does not count as an IDE these days.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://schemeway.sourceforge.net/"&gt;http://schemeway.sourceforge.net/&lt;/a&gt;
- not investigated yet.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://schemeway.sourceforge.net/update-site/"&gt;
http://schemeway.sourceforge.net/update-site/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Other Eclipse plug-ins? Other free, refactoring IDEs for Lisp?&lt;/p&gt;
&lt;h4&gt;Common Lisp / CLOS&lt;/h4&gt;
&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Common_lisp"&gt;Wikipedia - Common Lisp&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Common_Lisp_Object_System"&gt;Wikipedia - 
CLOS&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Book: &lt;a href="http://www.cs.cmu.edu/Groups/AI/html/cltl/cltl2.html"&gt;Common Lisp the Language, 2nd Edition&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="A%20Brief%20Guide%20to%20CLOS"&gt;A Brief Guide to CLOS&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://jatha.sourceforge.net/"&gt;http://jatha.sourceforge.net/&lt;/a&gt;
Common LISP library in Java (LGPL)&lt;/p&gt;
&lt;p&gt;Common Lisp Wiki - &lt;a href="http://www.cliki.net/index"&gt;http://www.cliki.net/&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;Lisp and .net&lt;/h4&gt;
&lt;p&gt; &lt;a href="http://jena.sourceforge.net/ARQ/"&gt;ARQ&lt;/a&gt; runs fine on .Net, so a CLR (.net and mono) lisp implementation is also
interesting.&lt;/p&gt;
&lt;p&gt;I didn't have time for much of a look around but did find
Common Larceny:
&lt;a href="http://www.ccs.neu.edu/home/will/Larceny/CommonLarceny/download.html"&gt;
http://www.ccs.neu.edu/home/will/Larceny/CommonLarceny/download.html&lt;/a&gt; &lt;/p&gt;
&lt;p&gt;Not scheme nor Common Lisp:
&lt;a href="http://dotlisp.sourceforge.net/dotlisp.htm"&gt;http://dotlisp.sourceforge.net/dotlisp.htm&lt;/a&gt; (BSD)
but last release: July 9, 2003. Patches from &lt;a href="http://members.ozemail.com.au/~markhurd/"&gt;the authors home page.&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;Other&lt;/h4&gt;
&lt;p&gt;&lt;a href="http://www-sop.inria.fr/mimosa/fp/Bigloo/"&gt;Bigloo&lt;/a&gt;: can call Java.
Compiles scheme to the JVM (?? and CLR), can link in Java classes but
I couldn't find a clear statement as to how to use in a mixed
environment.&lt;/p&gt;
&lt;p&gt;Download link to Bigloo 2.9a for the JVM broken (2007-04-08)&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www-sop.inria.fr/mimosa/fp/Bigloo/"&gt;http://www-sop.inria.fr/mimosa/fp/Bigloo/&lt;/a&gt;
(GPL/LGPL)&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-1421332953032439442?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/1421332953032439442/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=1421332953032439442' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/1421332953032439442'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/1421332953032439442'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2007/04/some-lisp-links.html' title='Some Lisp Links'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-6833470415062129745</id><published>2007-03-09T11:55:00.000Z</published><updated>2007-03-09T12:02:51.295Z</updated><title type='text'>ARQ 2.0 beta</title><content type='html'>Just released the first ARQ 2.0 beta.  This version is a complete implementation of the SPARQL query language, and also includes the ARQ featurs of custom property functions and &lt;a href="http://seaborne.blogspot.com/2006/11/larq-lucene-arq.html"&gt;free-text search&lt;/a&gt;.  The plan is that this is the only beta.

The big change is that it uses the SPARQL algebra for query execution, and there is a query optimizer to choose an execution strategy that makes this better than simple evaluation of the algebra operators.

This makes ARQ up to date with where the workign group is going - that includes some changes of results to some queries, but, hopefully, these are restricted to queries that don't appear in the wild.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-6833470415062129745?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://jena.sourceforge.net/ARQ' title='ARQ 2.0 beta'/><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/6833470415062129745/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=6833470415062129745' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/6833470415062129745'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/6833470415062129745'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2007/03/arq-20-beta.html' title='ARQ 2.0 beta'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-6607168567587576974</id><published>2007-03-08T21:49:00.000Z</published><updated>2007-03-09T13:24:32.181Z</updated><title type='text'>SPARQL/Update</title><content type='html'>Modern applications are not single programs. Whether web or enterprise
systems, they are a number of components running on different computers.
What they do have in common is that they cooperate to deliver the
application.

SPARQL makes publishing RDF data on the web possible, but what about
those applications that maintain and update that RDF Data? There
needs to be a way for the application components to update their data.

SPARQL/Update (also know as SPARUL, pronounced a bit like "spiral") is
a language that takes the SPARQL style, and much of the grammar, and
provides both graph update and graph management operations.

It means those application components can talk a common language between
themselves or to a database.  Having a common update language then means
the application developer can choose one RDF toolkit for one component,
and another RDF system for another component, rather than having to
choose one programming language and find one system that does
everything required.

This first draft of &lt;a href="http://jena.hpl.hp.com/~afs/SPARQL-Update.html"
&gt;SPARQL/Update&lt;/a&gt; is published for comments, comment away!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-6607168567587576974?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://jena.hpl.hp.com/~afs/SPARQL-Update.html' title='SPARQL/Update'/><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/6607168567587576974/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=6607168567587576974' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/6607168567587576974'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/6607168567587576974'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2007/03/sparqlupdate.html' title='SPARQL/Update'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-334946577582047690</id><published>2007-02-12T15:17:00.000Z</published><updated>2007-02-12T15:18:28.679Z</updated><title type='text'>Jena SDB</title><content type='html'>&lt;p&gt;SDB was &lt;a href="http://tech.groups.yahoo.com/group/jena-dev/message/27512"&gt;released
for the first time&lt;/a&gt; last week. While this is the first alpha release, it's 
actually at least the second internal query architecture and second loader 
architecture (Damian did the work to get the loader working fast).&lt;/p&gt;
&lt;p&gt;SPARQL now has a
&lt;a href="http://seaborne.blogspot.com/2006/11/algebra-for-sparql.html"&gt;formal 
algebra&lt;/a&gt;. &lt;a href="http://jena.sf.net/ARQ"&gt;ARQ&lt;/a&gt; is used to turn the SPARQL 
query syntax into a algebra expression; SDB takes over and compiles it first to 
a relational algebra(ish) structure then generates SQL syntax. Now there is a 
SPARQL algebra, this is all quite a bit simpler for SDB which is why this is the second 
design for query generation; much of the work now comes naturally from ARQ.&lt;/p&gt;
&lt;h4&gt;Patterns&lt;/h4&gt;
&lt;p&gt;At the moment, SDB only translates basic graph patterns, join and leftjoin 
expressions to SQL and leaves anything else to ARQ to stitch back together. For 
the current patterns, there are two tricky cases to consider:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;SPARQL join relationships aren't exactly SQL's because they may involve 
  unbound variables.&lt;/li&gt;
  &lt;li&gt;Multiple OPTIONALs may need to be COALESCEd.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For the first, sometimes the join needs to involve more than equality 
relationships like &amp;quot;&lt;code&gt;if col1 = null or col1 = col2&lt;/code&gt;&amp;quot;, which is a bit 
of scope tracking, and for the second, if a variable can be bound in two or more 
OPTIONALs , you have to take the first binding. The scope tracking is needed 
anyway.&lt;/p&gt;
&lt;p&gt;Over time, more and more of the SPARQL algebra expression will be translated 
to SQL.&lt;/p&gt;
&lt;h4&gt;Layouts&lt;/h4&gt;SDB supports a number of databases layouts in two main 
classes:&lt;ul&gt;
  &lt;li&gt;Type 1 layouts have a single triple table, with all information encoded 
  into the columns of the table.&lt;/li&gt;
  &lt;li&gt;Type 2 layouts use a triple table with a separate RDF terms table&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Type 1 layouts are good at supporting fine-grained API calls where the need 
to join to get the actual RDF terms is removed because they are encoded into the 
triple tables columns. Jena's existing database layer, RDB, is an example of 
this. When the move was made from Jena1 to Jena2, the
&lt;a href="http://www.hpl.hp.com/techreports/2003/HPL-2003-266.html"&gt;DB layout 
changed&lt;/a&gt; to a type 1 layout and it went faster. The two type 1 layouts 
supported are Jena's existing RDB layout and a debug version which encodes RDF 
terms in SPARQL-syntax directly into the triples table so you can simple read 
the table with an SQL query browser.&lt;/p&gt;
&lt;p&gt;Type 2 layouts, where the triples table has pointers to a nodes table, are 
better as SPARQL queries gets larger. Databases prefer small, fixed width columns to 
variable string comparisons. SDB has two variations, one using 4 bytes integer 
indexes and one using 8 byte hashes. The hash form means that hash of query 
constants can be calculated and don't have to be looked up in the SQL.&lt;/p&gt;
&lt;p&gt;It seemed that the hash form would be better all round.&amp;nbsp; But it isn't 
- loading was 2x slower (sometimes worse) despite the fact that RDF terms don't 
have to be inserted into the nodes table first to get their auto-allocated 
sequence id. Databases we have tried are significant slower indexing 8 byte 
quantities than 4 byte quantities and this dominates the load performance.&lt;/p&gt;
&lt;h4&gt;Next&lt;/h4&gt;
&lt;p&gt;There are three directions to go:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Inference support&lt;/li&gt;
  &lt;li&gt;Application-specific layout control&lt;/li&gt;
  &lt;li&gt;Filters support&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;(1) and (2) are linked by the fact it is looking at a query and deciding, for 
certain known predicates and part-patterns, that different database tables 
should be used instead.&amp;nbsp; See
&lt;a href="http://www.hpl.hp.com/techreports/2006/HPL-2006-140.html"&gt;Kevin's work 
on property tables&lt;/a&gt; which uses the approach to put some higher level 
understanding of the data back into the RDF store. (3) is &amp;quot;just&amp;quot; a matter of 
doing it.&lt;/p&gt;
&lt;p&gt;And if you have some feature request, do suggest it.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-334946577582047690?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/334946577582047690/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=334946577582047690' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/334946577582047690'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/334946577582047690'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2007/02/jena-sdb.html' title='Jena SDB'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-5633815229444545604</id><published>2007-02-02T15:45:00.000Z</published><updated>2007-02-02T15:51:12.773Z</updated><title type='text'>ARQ : what next?</title><content type='html'>&lt;p&gt;In January, I got round to releasing a new version of &lt;a href="http://jena.sf.net/ARQ"&gt;
ARQ&lt;/a&gt; and also doing a first full release of SPARQL version of
&lt;a href="http://www.joseki.org/"&gt;Joseki&lt;/a&gt;. ARQ version 1.5 had a problem with 
a performance hit for certain database usages (boo, hiss) but this is fixed in 
version 1.5.1.&lt;/p&gt;

&lt;p&gt;A key feature of this version of ARQ is support for
&lt;a href="http://jena.sourceforge.net/ARQ/lucene-arq.html"&gt;free text search&lt;/a&gt;. The core indexing is done by &lt;a href="http://lucene.apache.org/java/"&gt;Lucene&lt;/a&gt; 
2. Searches bind matching resources to a variable and are done using a
&lt;a href="http://jena.sourceforge.net/ARQ/extension.html#propertyFunctions"&gt;
property function&lt;/a&gt;. This means they can be fast because it directly 
exploits the indexing capabilities of Lucene.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;pre class="box"&gt;PREFIX pf: &amp;lt;java:com.hp.hpl.jena.query.pfunction.library.&amp;gt;
  SELECT ?doc {
    ?lit pf:textMatch '+text' .
    ?doc ?p ?lit
  }&lt;/pre&gt;

&lt;p&gt;What next? This release matches the published DAWG working draft but the 
working group is considering a
&lt;a href="http://seaborne.blogspot.com/2006/11/algebra-for-sparql.html"&gt;SPARQL 
algebra&lt;/a&gt; the formalization of the semantic of SPARQL to be based on some like 
the &lt;a href="http://en.wikipedia.org/wiki/Relational_algebra"&gt;relational algebra&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This version of ARQ has a number of query engines:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;code&gt;QueryEngine&lt;/code&gt; is the one used normally.&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;QueryEngineHTTP&lt;/code&gt; is a
  &lt;a href="http://www.w3.org/TR/rdf-sparql-protocol/"&gt;SPARQL protocol&lt;/a&gt; client 
  for remote query.&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;QueryEngineRef&lt;/code&gt; is a direct, simple implementation of the SPARQL 
  algebra.&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;QueryEngineQuad&lt;/code&gt; is a modified version of the reference query 
  engine that compiles SPARQL to quads, not triples.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;QueryEngineRef&lt;/code&gt; compiles a SPARQL query into the SPARQL algebra 
and it also provides a very simple evaluation of that algebra expression. The 
evaluation can be slow because it calculates all sub-expressions, then 
joins/left-joins/unions them but it is written to be correct, rather than fast. 
It is much better to extend the reference engine and convert the algebra 
expression into something cleverer.&lt;/p&gt;
&lt;p&gt;If you use the &lt;code&gt;--engine=ref&lt;/code&gt; argument to
&lt;a href="http://jena.sourceforge.net/ARQ/cmds.html#arq.query"&gt;&lt;code&gt;arq.sparql&lt;/code&gt;&lt;/a&gt;, 
it will use the reference engine. To see the algebra expression, use
&lt;a href="http://jena.sourceforge.net/ARQ/cmds.html#arq.qparse"&gt;&lt;code&gt;arq.qparse&lt;/code&gt;&lt;/a&gt; 
with arguments &lt;code&gt;--engine=ref --print=op&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The original query engine, which is still the default one, uses a 
substitution algorithm to evaluate a query. This will exploit the indexes that 
Jena maintains for all graphs by replacing any variables already bound to some 
value with their current value and so giving more information to the basic graph 
pattern matcher. This is done in a steaming fashion which is why this query 
engine only takes a constant amount of memory regardless of the data size.&lt;/p&gt;

&lt;p&gt;The next version of ARQ will combine the two approaches in a new query engine 
- where possible (and, for real world, queries that means nearly always), it 
will use the streaming, indexed approach to execution and only resort to a 
potentially more memory-intensive approach for parts of the query that really 
need it. The good news is that while it might sound like writing a whole new 
subsystem from scratch, it isn't. The SPARQL algebra compiler is already part of 
the reference engine and new query engine extends that; in addition, much of the 
code of the streaming engine is unchanged and it's just the conversion from the 
algebra to the iterator structure that needs to be written. &lt;/p&gt;

&lt;p&gt;It's also a chance to go back and clean up some code that has been around for 
a long time and, with hindsight, can be structured better. My programming style 
has changed over the time ARQ has been in development as I try different 
patterns and designs then learn what works and what doesn't (makes me get 
frustrated at some of the design of Java as well). Some old-style code does the 
right thing; it just does it in a way I would not choose to do it now. Don't get 
that chance very often.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-5633815229444545604?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/5633815229444545604/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=5633815229444545604' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/5633815229444545604'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/5633815229444545604'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2007/02/arq-what-next.html' title='ARQ : what next?'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-116492527651752430</id><published>2006-11-30T22:20:00.000Z</published><updated>2006-11-30T22:21:43.340Z</updated><title type='text'>SPARQL Basic Graph Pattern Matching</title><content type='html'>&lt;p&gt;
I &lt;a href="http://seaborne.blogspot.com/2006/11/algebra-for-sparql.html"
&gt;wrote about&lt;/a&gt; the &lt;a href="http://www.w3.org/2001/sw/DataAccess/rq23/rq24-algebra.html"&gt;SPARQL algebra&lt;/a&gt; last time, not with the idea of a regular update about DAWG work but noting a  significant issue being considered by the working.  
I hope that the wider community will comment and contribute; much of the ground work for the algebra is drawn from outside the working group anyway.&lt;/p&gt;

&lt;p&gt;
This week DAWG made a significant decision in a different, but nearby area, that of basic graph pattern matching 
(BGP).&lt;/p&gt;
&lt;h4&gt;
Basic Graph Patterns&lt;/h4&gt;
&lt;p&gt;Basic graph patterns (BGPs) are the building block in SPARQL. They are a 
sequence of adjacent triple patterns in the query string. A BGP is matched 
against whatever is being queried and the results of matching feed into the 
algebra.&lt;/p&gt;

&lt;p&gt;
The decision is that no change, or rather the &amp;quot;&lt;a
href="http://www.w3.org/TR/rdf-sparql-query/#BGPsparql"
&gt;implementation hint&lt;/a&gt;" for BGP matching, made more formal, becomes the basis 
of SPARQL. No queries will be changed by the decision.
&lt;/p&gt;

&lt;h4&gt;The Issue&lt;/h4&gt;

&lt;p&gt;
Having reveal the punch line, what's the joke?
The issue is whether blank nodes in BGP matching behave like hidden, named variables or do they behave as a different kind of variable all together.  This matters for counting - it does not matter for the logical meaning of a solution.
&lt;/p&gt;

&lt;p&gt;Example data:&lt;/p&gt;

&lt;pre class="box"&gt;  :a :p 1 .
  :a :p 2 .&lt;/pre&gt;

&lt;p&gt;Example query pattern:&lt;/p&gt;

&lt;pre class="box"&gt; { ?x :p [] }&lt;/pre&gt;

&lt;p&gt;How many answers?  Blank nodes are existential variables in RDF; named variables (the regular &lt;code&gt;?x&lt;/code&gt; ones) are universal variables.  Queries don't return the binding of an existential; queries can return the binding of a named variable and the bound value of a named variables can be passed to other parts of the query (FILTERs, OPTIONALs 
etc) via the algebra.&lt;/p&gt;

&lt;p&gt;In the absence of a DISTINCT in the query, are the solutions to this pattern:&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;1 : ?x = :a&lt;/li&gt;
 &lt;li&gt;2 : ?x = :a , ?x = :a&lt;/li&gt;
 &lt;li&gt;Either 1 or 2&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In OWL-DL, existential variables give more expressivity: if there is a disjunction in the ontology, the reasoner can know that something exists, but not need to find an actual value - and it may not be able to 
find the value anyway - but the reasoner does know there is "something there&amp;quot;.&lt;/p&gt;


&lt;h4&gt;Users and Implementers&lt;/h4&gt;

&lt;p&gt;For application writers, the main use of blank nodes in queries that I've seen is 
the convenience of using &lt;code&gt;[]&lt;/code&gt;&lt;/p&gt;

&lt;pre class="box"&gt; {
   ?x :something [ :p ?v ; :q ?w ]
 } &lt;/pre&gt;

&lt;p&gt;&lt;code&gt;[]&lt;/code&gt; saves having to have a named variable and splitting the 
expression. It would be one more thing to learn if that causes different answers 
(duplicates) to the pattern using query variables:&lt;/p&gt;

&lt;pre class="box"&gt; {
   ?x :something ?z .
   ?z :p ?v ; 
      :q ?w .
 } &lt;/pre&gt;

&lt;p&gt;especially if the first pattern had been written: &lt;/p&gt;

&lt;pre class="box"&gt; {
   ?x  :something _:bnode .
   _:bnode :p ?v ; 
           :q ?w . 
 } &lt;/pre&gt;

&lt;p&gt;which is the &lt;code&gt;[]&lt;/code&gt; example written out with a labelled blank node. 
This is also why it's BGPs that are the building block, not triple patterns - 
blank nodes are scoped to BGPs.&lt;/p&gt;
&lt;p&gt;For implementers, having two kinds of variable that behave different in matching can make life a harder, especially if they using some existing technology, like an SQL database 
which has one preferred type (universal-like).  Work on SPARQL semantics suggests that there can be one (complicated) SQL statement for one SPARQL query.  Nice if it's true because all the work that has gone into SQL database optimizers is 
reusable. Having two types of variables would need extra SQL to be generated like 
nested SELECTs, or some post-processing done. &lt;/p&gt;

&lt;p&gt;More work for implementers is not good; many open source RDF toolkits are built by people in their own time or by people with other things to do at work.  So, it 
might be a 
noticeable dampener on toolkit development and hence deployment.  Not a good idea to do this without a reasonable need.&lt;/p&gt;


&lt;h4&gt;DAWG Resolution&lt;/h4&gt;
&lt;p&gt;Logically, duplicates don't make any difference. DAWG decided there are two 
solutions for simple entailment matching example above. It's what 
implementations currently do.&lt;/p&gt;

&lt;p&gt;
The
DAWG process is often test case driven.  We use test cases as both a concrete way to define what results we expect and also to make concrete decisions.  So once consensus was emerging to have duplicates, the 
working group decided to approve a test case that captured that.  In this case, the test is 
&lt;a href="http://www.w3.org/2001/sw/DataAccess/tests/#rdfsemantics-bnode-type-var"
&gt;rdfsemantics-bnode-type-var&lt;/a&gt; (the example above is the same idea, just 
shorter).
&lt;/p&gt;

&lt;h4&gt;Defining BGP Matching for SPARQL&lt;/h4&gt;

&lt;p&gt;
In &lt;a href="http://www.w3.org/TR/2006/WD-rdf-sparql-query-20060220/"&gt;Last Call 2&lt;/a&gt;, 
the &lt;a href="http://www.w3.org/TR/2006/CR-rdf-sparql-query-20060406/"&gt;first CR 
publication&lt;/a&gt; and the 
&lt;a href="http://www.w3.org/TR/2006/WD-rdf-sparql-query-20061004/"&gt;intermediate publication&lt;/a&gt; 
versions of SPARQL, BGP matching is a 
&lt;a href="http://www.w3.org/TR/2006/CR-rdf-sparql-query-20060406/#BasicGraphPatternMatching"&gt;complicated definition&lt;/a&gt; that allows for different entailment regimes for extensibility 
within a single entailment framework.  SPARQL is defined for simple entailment, and 
so for anything where there a virtual graph, but it allows for extension to more complicated entailment regimes, like OWL-DL.&lt;/p&gt;

&lt;p&gt;The trouble is the matching definition is that it has a bug in it; it allows too many solutions when the graph is redundant.  The bug has to be fixed. &lt;/p&gt;
&lt;p&gt;SPARQL is defined for &lt;a href="http://www.w3.org/TR/rdf-mt/#entail"&gt;simple 
entailment&lt;/a&gt;. With this decision on how blank nodes behave, the working group can now rework 
the BGP matching.&amp;nbsp; The options are outlined in the message
&lt;a href="http://lists.w3.org/Archives/Public/public-rdf-dawg/2006OctDec/0140.html"&gt;2006OctDec/0140&lt;/a&gt;


and either of the variable mapping approaches can be made to work. It's more 
akin to a formalised version of the
&lt;a href="http://www.w3.org/TR/2005/WD-rdf-sparql-query-20050721/"&gt;last call 1&lt;/a&gt; 
style and the &lt;a
href="http://www.w3.org/TR/rdf-sparql-query/#BGPsparql"
&gt;implementation hint&lt;/a&gt; of the complex version.&lt;/p&gt;
&lt;h4&gt;... and Extensibility&lt;/h4&gt;
&lt;p&gt;Extensibility comes by replacing the whole of BGP matching, with just a few 
natural conditions such as all named variables must be bound in a BGP match and there are no accidental 
clashes of blank nodes. &lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;i&gt;(Vested interest declaration)&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;OK - &lt;a href="http://jena.sourceforge.net/ARQ"&gt;ARQ&lt;/a&gt; isn't built on SQL; it's a self contained query engine with various extension points - by default, it queries any 
&lt;a href="http://jena.sourceforge.net/"&gt;Jena&lt;/a&gt; graph and exploits Jena's direct connection to the storage layer but only for basic graph patterns, not more complex patterns. 
It could sort out the two types of variables quite easily.  &lt;/p&gt;

&lt;p&gt;But we also have a Jena SPARQL database engine (called SDB) that is built on SQL and is specifically designed for SPARQL, for large datasets 
and for named graphs.&lt;/p&gt;

&lt;p&gt;SDB fits in as an ARQ query engine and the more it can turn into a single SQL 
statement the better. All of the query being the ideal case (and hopefully the 
normal one).&amp;nbsp; So making the translation to a relational quad store simpler 
is helpful for SDB.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-116492527651752430?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/116492527651752430/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=116492527651752430' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/116492527651752430'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/116492527651752430'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2006/11/sparql-basic-graph-pattern-matching.html' title='SPARQL Basic Graph Pattern Matching'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-116429409503210902</id><published>2006-11-23T15:00:00.000Z</published><updated>2006-11-24T15:58:56.253Z</updated><title type='text'>An Algebra for SPARQL</title><content type='html'>&lt;p&gt;SPARQL, &lt;a href="http://la.wikipedia.org/wiki/Gallia"&gt;like Gaul&lt;/a&gt;,
is dividied into three parts:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Basic graph pattern matching&lt;/li&gt;
  &lt;li&gt;Graph expressions&lt;/li&gt;
  &lt;li&gt;Solution modifiers &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Basic graph patterns (BGPs) are a block of adjacent triple patterns
that match, or don't match, all together.  In BGP matching every named
variable must have some value for the BGP to have been matched.  Filters
add restrictions on the values variables can take.&lt;/p&gt;

&lt;p&gt;They are also an extension point in SPARQL because you can plug in
a different BGP matcher (e.g. a DL reasoner or a wrapper for legacy
SQL data), then reuse the other two layers to form complex queries out
of the exact matches from BGP matching.&lt;/p&gt;

&lt;p&gt;Graph expressions (&lt;code&gt;OPTIONAL&lt;/code&gt;, &lt;code&gt;UNION&lt;/code&gt;, 
&lt;code&gt;GRAPH&lt;/code&gt;, Groups (things between {}) combine BGPs in various
 ways to give more complicated patterns. Graph expressions are recursive
 so you can have patterns within patterns.  A SPARQL query has one
 graph expression in the WHERE clause.&lt;/p&gt;

&lt;p&gt;Solution modifiers (Project, &lt;code&gt;DISTINCT&lt;/code&gt;, 
&lt;code&gt;ORDER BY&lt;/code&gt;, &lt;code&gt;LIMIT/OFFSET&lt;/code&gt;) take the output of
matching the query graph pattern and process it in various ways to
yield the result set.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Current Situation&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;The semantics for SPARQL have, so far, been declarative and top-down.
A solution to a query passes if it meets all the conditions following
from the definitions of each of the SPARQL graph pattern forms and filters.
That means, in theory, the whole of the solution is available.&lt;/p&gt;

&lt;p&gt;In fact, ARQ builds solutions up by adding the
minimum necessary at each subpattern to make the solution match.
This is what gives ARQ 
streaming evaluation which keeps the total memory use down.  But the
effect is the same, whole solutions are available everywhere in the
pattern.&lt;/p&gt;

&lt;p&gt;Another way is to have a bottom-up algebra.  The 
&lt;a href="http://en.wikipedia.org/wiki/Relational_algebra"&gt;relational
algebra&lt;/a&gt; works this way. Subexpressions are evaluated
to give tables (relations) and then these tables combined together
to form new tables.&lt;/p&gt;

&lt;p&gt;It turns out that these are nearly the same as shown by
&lt;a href="http://dowhatimean.net/"&gt;Richard Cyganiak&lt;/a&gt; and
&lt;a href="http://ing.utalca.cl/~jperez/"&gt;Jorge Perez&lt;/a&gt; et al
(see references below).
The class of graph patterns that differ are a certain kind of
nested
&lt;code&gt;OPTIONAL&lt;/code&gt; where a variable is used on the outside,
fixed part, and in the inner &lt;code&gt;OPTIONAL&lt;/code&gt;, but not in
between.&lt;/p&gt;

&lt;pre class="box"&gt;SELECT *
{ :x1 :p &lt;b&gt;?v&lt;/b&gt; .
  OPTIONAL
  { :x3 :q ?w .
    OPTIONAL { :x2 :p &lt;b&gt;?v&lt;/b&gt; }
  }
}&lt;/pre&gt;

&lt;p&gt;Now that is a rather unusual query.
The tricky bit is the use of &lt;code&gt;?v&lt;/code&gt; at the outermost and
innermost levels but not in between.The query can be rewritten so
as not to nest (note the repeat of &lt;code&gt;:x3 :q ?w&lt;/code&gt;).&lt;/p&gt;

&lt;pre class="box"&gt;SELECT *
{ :x1 :p ?v .
  OPTIONAL
    { :x3 :q ?w }
  OPTIONAL 
    { :x3 :q ?w  . :x2 :p ?v }
}&lt;/pre&gt;

&lt;p&gt;A few references for work in this area:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.hpl.hp.com/techreports/2005/HPL-2005-170.html"&gt;A
relational algebra for SPARQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://arxiv.org/abs/cs.DB/0605124"&gt;Semantics and Complexity of SPARQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Follow up to the paper above:
&lt;a href="http://ing.utalca.cl/~jperez/papers/sparql_semantics.pdf"&gt;Semantics of SPARQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;The paper 
"&lt;a href="http://www.cs.wayne.edu/~artem/main/research/TR-DB-052006-CLJF.pdf"&gt;Semantics Preserving SPARQL-to-SQL Query Translation
for Optional Graph Patterns&lt;/a&gt;"
includes doing the difficult case of nested optionals.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;b&gt;A SPARQL Algebra&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;In &lt;a href="http://www.w3.org/2001/sw/DataAccess/"&gt;DAWG&lt;/a&gt;, we're considering
an algebra for SPARQL, based on the paper
&lt;a href="http://ing.utalca.cl/~jperez/papers/sparql_semantics.pdf"&gt;Semantics
of SPARQL&lt;/a&gt;. This will mean a change of results in the case
above. I've never seen such a query in the wild, only in artificial test cases.&lt;/p&gt;

&lt;p&gt;The change from declarative semantics to a constructional algebra is
motivated primarily by the implementers in the working group wanting to
apply all the good stuff on relational algebra optimization to SPARQL
engines.  The complexity of the exact mapping to SQL in the
&lt;a href="http://www.cs.wayne.edu/~artem/main/research/TR-DB-052006-CLJF.pdf"&gt;
semantics preserving SPARQL-to-SQL&lt;/a&gt; is daunting.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Not &lt;i&gt;that&lt;/i&gt; simple!&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;But other things differ as well if a purely relational style is 
applied (and that's not being proposed currently) because
"&lt;a href="http://ing.utalca.cl/~jperez/papers/sparql_semantics.pdf"&gt;Semantics of SPARQL&lt;/a&gt;" 
does not consider the difference in treatment of filters (it's focus is on the 
combination of graph patterns).&lt;/p&gt;

&lt;p&gt;Two cases arise: one where filters are in &lt;code&gt;OPTIONAL&lt;/code&gt;s, and one 
where filters are nested inside groups.&lt;/p&gt;
&lt;pre class="box"&gt;SELECT * 
{ ?x :p &lt;b&gt;?v&lt;/b&gt; .
  OPTIONAL
  { 
    ?y :q ?w .
    FILTER(&lt;b&gt;?v&lt;/b&gt; = 2)
  }
}&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;FILTER&lt;/code&gt; uses a variable from the fixed part of
the &lt;code&gt;OPTIONAL&lt;/code&gt;. Under Jorge's semantics the variable
isn't in-scope, so the filter evaluation is an error (unbound variable),
otherwise known as false; the optional part never matches.
This form of query is realistic and does occur for real
so the current proposal for DAWG makes this work. Like SQL
and &lt;code&gt;LEFT OUTER JOIN&lt;/code&gt; with the &lt;code&gt;ON&lt;/code&gt; clause,
the LeftJoin operation in the SPARQL algebra can take a condition
which applies over the whole LeftJoin.&lt;/p&gt;

&lt;p&gt;The fixed
pattern could be repeated inside the optional part but such repetition
for something application might well use is bad, very bad.&lt;/p&gt;

&lt;p&gt;Application writers will have to take care about scope and
groups: putting &lt;code&gt;FILTER&lt;/code&gt;s inside &lt;code&gt;{}&lt;/code&gt; changes
their scope.&lt;/p&gt;

&lt;p&gt;Currently,&lt;/p&gt;

&lt;pre class="box"&gt;{ :x :p ?v . FILTER(?v &amp;lt; 5) }&lt;/pre&gt;

&lt;p&gt;and&lt;/p&gt;

&lt;pre class="box"&gt;{ :x :p ?v . { FILTER(?v &amp;lt; 5) } }&lt;/pre&gt;

&lt;p&gt;have the same effect. But they aren't in the algebraic form
because &lt;code&gt;{ FILTER(?v &amp;lt; 5) }&lt;/code&gt; evaluates on the empty
table that does not have a &lt;code&gt;?v&lt;/code&gt; in it; evaluation is an
error and hence false.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;ARQ&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;There is an implementation of the algebra in
&lt;a href="http://jena.sourceforge.net/ARQ/download.html"&gt;ARQ in CVS&lt;/a&gt;
(not in the current release ARQ 1.4). It's enabled in the command
line tool
&lt;a href="http://jena.sourceforge.net/ARQ/cmds.html"&gt;arq.sparql&lt;/a&gt;
with &lt;code&gt;--engine=ref&lt;/code&gt; from the command line and&lt;/p&gt;
&lt;pre class="box"&gt;QueryEngine2.register() ;&lt;/pre&gt;
&lt;p&gt;in code.&lt;/p&gt;

&lt;p&gt;The work-in-progress
&lt;a href="http://www.w3.org/2001/sw/DataAccess/rq23/rq24-algebra.html"&gt;unfinished
editor's working text&lt;/a&gt; is available.  Check it is actually up to
date because that's just work space and isn't a DAWG document. It does not
reflect working group decisions - it more a proposal for consideration.&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;i&gt;Disclaimer&lt;/i&gt;: Any views expressed here are mime and not
the working group (which at
the time of writing is thinking about it but hasn't decided
anything yet).&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-116429409503210902?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://www.w3.org/2001/sw/DataAccess/rq23/rq24-algebra.html' title='An Algebra for SPARQL'/><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/116429409503210902/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=116429409503210902' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/116429409503210902'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/116429409503210902'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2006/11/algebra-for-sparql.html' title='An Algebra for SPARQL'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-116336048318288909</id><published>2006-11-12T19:40:00.000Z</published><updated>2006-11-12T20:02:54.250Z</updated><title type='text'>LARQ = Lucene + ARQ</title><content type='html'>&lt;p&gt;
&lt;a href="http://www.w3.org/TR/rdf-sparql-query/"&gt;SPARQL&lt;/a&gt; is normally
thought of as only querying fixed RDF data. At the core of SPARQL are
the building blocks of basic graph patterns, and on top of these there is
an algebra to create more complex patterns (OPTIONAL UNION, FILTER, GRAPH).
&lt;/p&gt;

&lt;p&gt;The key question a basic graph pattern asks is "does this pattern match
the graph". The named variables record how the pattern matches.
&lt;/p&gt;

&lt;p&gt;
Not all information needs to be in the raw data.  ARQ 
&lt;a href="http://jena.sourceforge.net/ARQ/extension.html#propertyFunctions"
&gt;property functions&lt;/a&gt; are a way to let the application add some
relationships to be computed at query execution time.
&lt;/p&gt;

&lt;p&gt;&lt;a href="http://jena.sf.net/ARQ/lucene-arq.html"&gt;LARQ&lt;/a&gt;
adds free text search. The real work is done by 
&lt;a href="http://lucene.apache.org"&gt;Lucene&lt;/a&gt;.
LARQ adds ways to create a Lucene index from RDF data
and a property function to perform free text matching
in a SPARQL query.&lt;/p&gt;

&lt;p&gt;Example: find all the string literals that match '+keyword'&lt;/p&gt;
&lt;pre class="box"&gt;
PREFIX pf: &amp;lt;&lt;b&gt;java:com.hp.hpl.jena.query.pfunction.library.&lt;/b&gt;&amp;gt

SELECT *
  { ?lit &lt;b&gt;pf:textMatch&lt;/b&gt; '+keyword' }&lt;/pre&gt;
&lt;p&gt; Any simple or complex 
&lt;a href="http://lucene.apache.org/java/docs/queryparsersyntax.html"
&gt;Lucene query string&lt;/a&gt; can be used.&lt;/p&gt;

&lt;p&gt;LARQ provides utilities to index string literals.  As the literal can
be stored as well, a query can find the subjects with some property value
matching the free text search.&lt;/p&gt;

&lt;p&gt;So to find all the document that have titles matching some free
text search:&lt;/p&gt;
&lt;pre class="box"&gt;
PREFIX pf: &amp;lt;&lt;b&gt;java:com.hp.hpl.jena.query.pfunction.library.&lt;/b&gt;&amp;gt
PREFIX dc: &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt;
  
SELECT ?doc {
    ?lit &lt;b&gt;pf:textMatch&lt;/b&gt; '+text' .
    ?doc ?p ?lit
  }&lt;/pre&gt;

&lt;p&gt;
More details in the 
&lt;a href="http://jena.sf.net/ARQ/lucene-arq.html"
&gt;ARQ documentation for LARQ&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;This will be in ARQ 1.5 but is available from 
&lt;a href="http://jena.sourceforge.net/ARQ/download.html"
&gt;ARQ CVS&lt;/a&gt; now.  Hopefully, this will be useful to users and 
application writers.
Comments and feedback on the design are welcome, especially 
before the next ARQ release.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-116336048318288909?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://jena.sf.net/ARQ/lucene-arq.html' title='LARQ = Lucene + ARQ'/><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/116336048318288909/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=116336048318288909' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/116336048318288909'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/116336048318288909'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2006/11/larq-lucene-arq.html' title='LARQ = Lucene + ARQ'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-116074507369596325</id><published>2006-10-13T14:08:00.000+01:00</published><updated>2006-10-13T14:14:15.206+01:00</updated><title type='text'>Assignment Property Function</title><content type='html'>&lt;p&gt;
&lt;a href="http://www.w3.org/TR/rdf-sparql-query/"&gt;SPARQL&lt;/a&gt;
is a graph pattern matching language,
The matching can be semantic
(basic graph patterns are entailed by the data in order to match)
or through the
algebra that works over the top of basic graph pattern matching.&lt;/p&gt;
&lt;p&gt;
Complex queries make use of &lt;code&gt;UNION&lt;/code&gt; and &lt;code&gt;OPTIONAL&lt;/code&gt;,
including the idiom with
&lt;code&gt;OPTIONAL&lt;/code&gt; and &lt;code&gt;!BOUND&lt;/code&gt;, in all sorts of creative ways.
And sometimes they want the answers to indicate which way the query matched
to get a particular solution.
&lt;/p&gt;

&lt;pre class="box"&gt;
PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&gt;

SELECT ?label
{
   { ?concept skos:prefLabel ?label }
 UNION
   { ?concept skos:altLabel ?label }
}
&lt;/pre&gt;

&lt;p&gt;
Which matched? the &lt;code&gt;prefLabel&lt;/code&gt; or &lt;code&gt;altLabel&lt;/code&gt;?
The application can assign the different to branches to different
variables although that might be inconvenient in a larger query
if &lt;code&gt;?label&lt;/code&gt; is used elsewhere in the query.&lt;/p&gt;

&lt;pre class="box"&gt;
PREFIX skos: &amp;lt;http://www.w3.org/2004/02/skos/core#&gt;

SELECT ?label_pref ?label_alt
{
  { ?concept skos:prefLabel ?label_pref }
UNION
  { ?concept skos:altLabel  ?label_alt }
}
&lt;/pre&gt;

&lt;p&gt;ARQ's &lt;a href="http://jena.sourceforge.net/ARQ/extension.html#propertyFunctions"&gt;property functions&lt;/a&gt; provide a way of adding different ways to match a
triple pattern.  Uses include calling out to custom code
to perform some calculation, or calling out to use a external
index, such as &lt;a href="http://lucene.apache.org/"&gt;Lucene&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;Property functions can assign to variables in the solution - unlike
&lt;code&gt;FILTER&lt;/code&gt; functions which must not have side effects. Property
functions can't change values but they can bind an unbound variable or check
an existing binding is compatible.&lt;/p&gt;

&lt;pre class="box"&gt;
PREFIX apf:   &amp;lt;java:com.hp.hpl.jena.query.pfunction.library.&gt;
SELECT ?x
{
 ?x apf:assign "Hello World"
}
&lt;/pre&gt;

&lt;pre class="box"&gt;
-----------------
| x             |
=================
| "Hello World" |
-----------------
&lt;/pre&gt;

&lt;p&gt;
It works both ways round: &lt;code&gt;{ "Hello World" apf:assign ?x } &lt;/code&gt;
gives the same results.
&lt;/p&gt;

&lt;p&gt;
If subject and object are constants or already bound,
&lt;code&gt;apf:assign&lt;/code&gt; checks to see if they are the same.  If they are,
the tripe pattern
matches with no change to the solution, otherwise it does not match and that solution
is rejected.  The "sameness" is &lt;a href="http://jena.sourceforge.net/"&gt;Jena's&lt;/a&gt;
&lt;a href="http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/graph/Node.html#sameValueAs%28java.lang.Object%29"&gt;
&lt;code&gt;sameValueAs&lt;/code&gt;&lt;/a&gt;,
just like the rest of graph matching.
If both variables are unbound, &lt;code&gt;apf:assign&lt;/code&gt; complains
and the query is broken.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-116074507369596325?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/116074507369596325/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=116074507369596325' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/116074507369596325'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/116074507369596325'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2006/10/assignment-property-function.html' title='Assignment Property Function'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-114701866289792772</id><published>2006-05-07T17:17:00.000+01:00</published><updated>2006-05-07T17:20:06.750+01:00</updated><title type='text'>Parameterized Queries</title><content type='html'>&lt;p&gt;Sometimes, an application will be making a SPARQL query,
using the results from a previous query or using some RDF
term found through the other Jena APIs.&lt;/p&gt;

&lt;p&gt;SQL has prepared statements - they allow an SQL statement to
take a number of parameters. The application fills in the
parameters and executes the statement.&lt;/p&gt;

&lt;p&gt;One way is to resort to doing this in SPARQL by building
a complete, new query string, parsing it and executing it.
But it takes a little care to handle all cases like
quoting special characters; you can at least use some of the
many utilities in ARQ for producing strings such as
&lt;code&gt;FmtUtils.stringForResource&lt;/code&gt; (it's 
not in the application API but in the &lt;code&gt;util&lt;/code&gt;
package currently).&lt;/p&gt;

&lt;p&gt;Queries in ARQ can be
&lt;a href="http://jena.sourceforge.net/ARQ/programmatic.html"&gt;built 
programmatically&lt;/a&gt; but it is tedious, especially when the
documentation hasn't been written yet.&lt;/p&gt;

&lt;p&gt;Another way is to use query variables and bind them to
initial values that apply to all query solutions. Consider
the query:&lt;/p&gt;

&lt;pre class="box"&gt;PREFIX dc &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt;
SELECT ?doc { ?doc dc:title ?title }&lt;/pre&gt;

&lt;p&gt;It gets documents and their titles.&lt;/p&gt;

&lt;p&gt;Executing a query in program
&lt;a href="http://jena.sourceforge.net/ARQ/app_api.html"&gt;might
look like&lt;/a&gt;:&lt;/p&gt;

&lt;pre class="box"&gt;import com.hp.hpl.jena.query.* ;

Model model = ... ;&lt;/pre&gt;
&lt;pre class="box"&gt;String queryString = StringUtils.join(&amp;quot;\n&amp;quot;,
         new String[]{
     &amp;quot;PREFIX dc &amp;lt;http://purl.org/dc/elements/1.1/&amp;gt;&amp;quot;,
     &amp;quot;SELECT ?doc { ?doc dc:title ?title }&amp;quot;
         }) ;
Query query = QueryFactory.create(queryString) ;
QueryExecution qexec =
    QueryExecutionFactory.create(query, model) ;
try {
    ResultSet results = qexec.execSelect() ;
    for ( ; results.hasNext() ; )
    {
       QuerySolution soln = results.nextSolution() ;
       Literal l = soln.getLiteral(&amp;quot;doc&amp;quot;) ;
    }
} finally { qexec.close() ; }&lt;/pre&gt;

&lt;p&gt;Suppose the application knows the title it's interesting
in - can it use this to get the document?&lt;/p&gt;

&lt;p&gt;The value of &lt;code&gt;?title&lt;/code&gt; made a parameter to the query
and fixed by an initial binding.  All query solutions will
be restricted to patterns matches where &lt;code&gt;?title&lt;/code&gt; 
is that RDF term.&lt;/p&gt;

&lt;pre class="box"&gt;QuerySolutionMap initialSettings = new QuerySolutionMap() ;
initialSettings.add(&amp;quot;title&amp;quot;, node) ;&lt;/pre&gt;

&lt;p&gt;and this is passed to the factory that creates QueryExecution's:&lt;/p&gt;
&lt;pre class="box"&gt;QueryExecution qexec = 
    QueryExecutionFactory.create(query,
                                 model,
                                 &lt;b&gt;initialSettings&lt;/b&gt;) ;&lt;/pre&gt;
&lt;p&gt;It doesn't matter if the node is a literal, a resource with
URI or a blank node. It becomes a fixed value in the query, even
a blank node, because it's not part of the SPARQL syntax, it's a
fixed part of every solution.&lt;/p&gt;

&lt;p&gt;This gives named parameters to queries enabling something
like SQL prepared statements except with named parameters not
positional ones.&lt;/p&gt;

&lt;p&gt;This can make a complex application easier to structure and
clearer to read. It's better than bashing strings together,
which is error prone, inflexible, and does not 
lead to clear code.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-114701866289792772?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/114701866289792772/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=114701866289792772' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/114701866289792772'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/114701866289792772'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2006/05/parameterized-queries_07.html' title='Parameterized Queries'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-114491949132660796</id><published>2006-04-13T10:07:00.000+01:00</published><updated>2006-04-13T10:40:20.840+01:00</updated><title type='text'>From RDQL to SPARQL</title><content type='html'>&lt;p&gt;
SPARQL is now (April 2006) a W3C Candidate Recommendation
which means it is stable enough for wide spread
implementation.&amp;nbsp; Actually, there are quite a few implementations already
(&lt;a href="http://esw.w3.org/topic/SparqlImplementations"&gt;SPARQL
implementations page on ESW wiki&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;SPARQL is defined by three documents:&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;&lt;a href="http://www.w3.org/TR/rdf-sparql-query/"&gt;SPARQL Query language&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;&lt;a href="http://www.w3.org/TR/rdf-sparql-protocol/"&gt;SPARQL Protocol&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;&lt;a href="http://www.w3.org/TR/rdf-sparql-XMLres/"&gt;SPARQL Query XML Results Format&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and there are tutorials like
&lt;a href="http://jena.sourceforge.net/ARQ/Tutorial/index.html"
&gt;this one&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;
&lt;a href="http://www.w3.org/Submission/RDQL/"&gt;RDQL&lt;/a&gt; predates
SPARQL - in fact, RDQL design predates the
current RDF specifications and some of the design decisions
in RDQL are a reflection of that.  The biggest of these is
that RDF didn't have any datatyping so RDQL handles tests
on, say, integers without checking the datatype (if it looks like
an integer, it can be tested as integer).
&lt;/p&gt;

&lt;p&gt;SPARQL has all the features of RDQL and more:&lt;/p&gt;
&lt;ul&gt;
 &lt;li&gt;ability to add optional information to query results&lt;/li&gt;
    &lt;li&gt;disjunction of graph patterns&lt;/li&gt;
    &lt;li&gt;more expression testing (date-time support, for example)&lt;/li&gt;
    &lt;li&gt;named graphs&lt;/li&gt;
    &lt;li&gt;sorting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;but, above all, it is more tightly specified so queries in one implementation 
should behave the same in all other implementations.&lt;/p&gt;

&lt;h2&gt;ARQ - A Query Engine for Jena&lt;/h2&gt;

&lt;p&gt;
In parallel with the developing the SPARQL specification,
I have been developing a new query subsystem for
&lt;a href="http://jena.sourceforge.net/"&gt;Jena&lt;/a&gt; called
&lt;a href="http://jena.sourceforge.net/ARQ/"&gt;ARQ&lt;/a&gt;.
ARQ is now part of the Jena download.&lt;/p&gt;

&lt;p&gt;
ARQ builds on top of the existing Jena query support
for matching of basic graph patterns
(BGPs are the building block in SPARQL).
ARQ can execute SPARQL and RDQL as well as an extended form of
SPARQL.  It has several extension points, such as
&lt;a href="/2006/02/property-functions-in-arq.html"
&gt;Property functions&lt;/a&gt;.  The ARQ query engine works with
any Jena Graph or Model.
&lt;/p&gt;


&lt;h2&gt;Converting RDQL code to SPARQL code&lt;/h2&gt;

&lt;p&gt;
The functionality of RDQL is a subset of SPARQL so it's
not hard to convert RDQL queries to SPARQL.  What needs
to be done is convert the triple syntax and convert
any constraints. 
&lt;/p&gt;

&lt;h3&gt;Syntax&lt;/h3&gt;

&lt;p&gt;
SPARQL syntax uses a &lt;a href="http://www.dajobe.org/2004/01/turtle/"
&gt;Turtle&lt;/a&gt;-like syntax which is familiar to anyone knowing N3.
Namespaces go at the start of the query, not after like
&lt;code&gt;USING&lt;/code&gt;.  There are no &lt;code&gt;()&lt;/code&gt; around triple
patterns; instaead there is a "&lt;code&gt;.&lt;/code&gt;" (a single dot)
between triple patterns.  An RDQL only ever has one graph pattern,
in SPARQL, blocks of triple patterns are 
delimited by &lt;code&gt;{}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You can even use the command line
&lt;a href="http://jena.sourceforge.net/ARQ/cmds.html#arq.qparse"
&gt;&lt;code&gt;arq.qparse&lt;/code&gt;&lt;/a&gt; to read in an
RDQL query and write out the SPARQL query but it's a rough approximation
you'll need to check and it may not be completely legal SPARQL.&lt;/p&gt;

&lt;h3&gt;Constraints&lt;/h3&gt;

&lt;p&gt;
The constraints need the most care because SPARQL uses RDF datatyping 
and RDQL doesn't. Some common areas are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;regular expressions&lt;/li&gt;
&lt;li&gt;string equality and numeric equality&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Regular expressions&lt;/h4&gt;

&lt;p&gt;A SPARQL regular expression looks like:&lt;/p&gt;
&lt;pre class="box"&gt;
regex(&lt;em&gt;expr&lt;/em&gt;, "&lt;em&gt;pattern&lt;/em&gt;")
regex(&lt;em&gt;expr&lt;/em&gt;, "&lt;em&gt;pattern&lt;/em&gt;", "i")
&lt;/pre&gt;

&lt;p&gt;
The catch is that the &lt;code&gt;&lt;em&gt;expr&lt;/em&gt;&lt;/code&gt; must be a literal; it can't be a URI.
(Well - it can, but it will never match!).  If you want to perform a regular
expression match on a URI, use the
&lt;a href="http://www.w3.org/2001/sw/DataAccess/rq23/#func-str"
&gt;&lt;code&gt;str()&lt;/code&gt;&lt;/a&gt; built-in to get the string form of the URI.
&lt;/p&gt;

&lt;pre class="box"&gt;
regex(str(?uri), "^http://example/ns#")
&lt;/pre&gt;

&lt;h4&gt;Equality&lt;/h4&gt;

&lt;p&gt;RDQL has &lt;code&gt;=&lt;/code&gt; for numeric equality and &lt;code&gt;eq&lt;/code&gt; for
string equality.  A number in RDQL was anything that can be parsed an a number, whether
it had a datatype or not (or even the wrong datatype).  Likewise, anything could be
treated as a string (like URIs in regular expressions).
&lt;/p&gt;

&lt;p&gt;SPARQL has &lt;code&gt;=&lt;/code&gt; which is taken from
&lt;a href="http://www.w3.org/TR/xpath-functions/"
&gt;XQuery/XPath Functions and Operators&lt;/a&gt;.  It decides whether that is numeric equals,
string equals or URI-equals based on the kind of arguments it is given.&lt;/p&gt;

&lt;h2&gt;API Changes&lt;/h2&gt;

&lt;p&gt;The ARQ API is in the package &lt;code&gt;com.hp.hpl.jena.query&lt;/code&gt;.  The RDQL API
is deprecated, starting with Jena 2.4.  The &lt;a href="http://jena.sourceforge.net/ARQ/app_api.html"&gt;new API&lt;/a&gt;
is similar in style to the old one for &lt;code&gt;SELECT,&lt;/code&gt; with iteration over 
the rows of the results (&lt;a
href="http://jena.sourceforge.net/ARQ/javadoc/index.html"&gt;javadoc&lt;/a&gt;).
Differences include the widespread use of factories, naming consistent with the 
SPARQL specifications, and different &lt;code&gt;exec&lt;/code&gt; operations for the 
different kinds of SPARQL query. &lt;code&gt;QueryExecution&lt;/code&gt; objects should be 
properly closed.&lt;/p&gt;

&lt;p&gt;One change is that to get the triples that matched a query, instead of asking the binding for the triples
that were used in the matching, the application should now make a
&lt;code&gt;CONSTRUCT&lt;/code&gt; query.&lt;/p&gt;

&lt;h2&gt;Experimenting with SPARQL&lt;/h2&gt;


&lt;p&gt;
There is a set of &lt;a href="http://jena.sourceforge.net/ARQ/cmds.html"&gt;command line utilities&lt;/a&gt;
to try out SPARQL queries from the command line.
&lt;/p&gt;

&lt;p&gt;A nice graphical interface is &lt;a href="http://www.ldodds.com/projects/twinkle/"&gt;twinkle&lt;/a&gt;
by Leigh Dodds.&lt;/p&gt;

&lt;p&gt;
There is also an implementation of the SPARQL protocol using ARQ,
project &lt;a href="http://www.joseki.org/"&gt;Joseki&lt;/a&gt;, and a demo site at &lt;a
href="http://www.sparql.org"&gt;http://www.sparql.org&lt;/a&gt;
where you can validate SPARQL queries and try them out.&lt;/p&gt;
&lt;h2&gt;
Questions?&lt;/h2&gt;
&lt;p&gt;
Send question and comments about ARQ to &lt;a href="mailto:jena-dev@groups.yahoo.com"&gt;jena-dev&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;
Send general questions and comments about SPARQL to the W3C list
&lt;a href="mailto:public-sparql-dev@w3.org"&gt;sparql-dev&lt;/a&gt;
(&lt;a href="http://lists.w3.org/Archives/Public/public-sparql-dev/"&gt;archive&lt;/a&gt;).
&lt;/p&gt;

&lt;p&gt;
If you have experiences converting from RDQL to SPARQL, then let me know
and I'll compile a list of common issues.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-114491949132660796?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/114491949132660796/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=114491949132660796' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/114491949132660796'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/114491949132660796'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2006/04/from-rdql-to-sparql.html' title='From RDQL to SPARQL'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-114011505742599747</id><published>2006-02-16T18:35:00.000Z</published><updated>2006-04-10T17:48:17.870+01:00</updated><title type='text'>Property Functions in ARQ</title><content type='html'>&lt;p&gt;These are properties that are calculated by some custom code, and not
done by the usual matching. There are two provided now: applications are free to provide application-specifc ones.&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;list:member - access the members of an RDF list.
&lt;/li&gt;&lt;li&gt;rdfs:member - access the members of rdf:Bag, rdf:Seq and rdf:Alt structures&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Full Extension &lt;a href="http://jena.sourceforge.net/ARQ/extensions.html"&gt;documentation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Normally, the unit of matching in ARQ is the basic graph pattern (a sequence of triple patterns).  These sets of triple patterns are dispatched to Jena for matching by Jena's graph-level query handler.
Each kind of storage provides the appropriate query handler. For example, the database &lt;span style="font-style: italic;"&gt;fastpath&lt;/span&gt;
is a translation of a set of triple patterns into a single SQL query involving joins.&lt;/p&gt;

&lt;p&gt;There is also a default implementation that works by using plain graph &lt;i&gt;find&lt;/i&gt; (a triple with possible wildcards) so a new storage system does not need to provide it's own query handler until it wants to exploit some feature of the storage.&lt;/p&gt;

&lt;p&gt;If a function property is encountered, then it is internally treated as a call to be an extension.  There is a registry of
function properties to implementing code.&lt;/p&gt;

&lt;pre&gt;  # Find all the members of a list (RDF collection)
  PREFIX  list:   &amp;lt;http://www.jena.hpl.hp.com/ARQ/list#&amp;gt;
  SELECT ?member
  { ?x :p ?list .
    ?list list:member ?member .
  }
&lt;/pre&gt;

&lt;p&gt;The functionality of &lt;code&gt;list:member&lt;/code&gt; is handled by a class
in the extension library so this query is treated much like the ARQ
extension:&lt;/p&gt;

&lt;pre&gt;  # Find all the members of a list (RDF collection)
  PREFIX  ext: &amp;lt;java:com.hp.hpl.jena.query.extension.library.&amp;gt;
  SELECT ?member
  { ?x :p ?list .
    EXT ext:list(?list, ?x)
  }
&lt;/pre&gt;

&lt;p&gt;where &lt;list&gt;ext:list&lt;/list&gt; is a function that bind its arguments
(unlike a &lt;code&gt;FILTER&lt;/code&gt; function). The property function form is legal SPARQL.&lt;/p&gt;

&lt;p&gt;So, this mechanism shows that collection access can be done in SPARQL
without resorting to handling told blank nodes.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.w3.org/2000/10/swap/doc/cwm.html"&gt;cwm&lt;/a&gt; (which is a forward chaining rules engine) and &lt;a href="http://www.agfa.com/w3c/euler/"&gt;Euler&lt;/a&gt;
(which is a backward-chaining rules engine) already provide this style of access.  Their property is &lt;a href="http://www.w3.org/2000/10/swap/list#in"&gt;&lt;code&gt;&lt;http:&gt;&lt;/http:&gt;&lt;/code&gt;&lt;/a&gt; - the subject and object meanings are the other way.&lt;/p&gt;

&lt;p&gt;ARQ provides &lt;code&gt;list:member&lt;/code&gt; to be like &lt;code&gt;rdfs:member&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;— Andy&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-114011505742599747?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/114011505742599747/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=114011505742599747' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/114011505742599747'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/114011505742599747'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2006/02/property-functions-in-arq.html' title='Property Functions in ARQ'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-18002060.post-113968248786457683</id><published>2006-02-11T18:26:00.000Z</published><updated>2006-04-10T17:47:22.713+01:00</updated><title type='text'>Progress with Jena.Net</title><content type='html'>&lt;p&gt;&lt;a href="http://B4mad.Net/datenbrei/"&gt;[GNU]&lt;/a&gt; wrote
&lt;a href="http://b4mad.net/datenbrei/archives/2006/01/19/hacking-jena-and-monoc/"&gt;about&lt;/a&gt;
building &lt;a href="http://jena.sourceforge.net"&gt;Jena&lt;/a&gt; for &lt;a
href="http://www.mono-project.com/"&gt;Mono&lt;/a&gt;, using
&lt;a href="http://www.ikvm.net/"&gt;IKVM &lt;/a&gt;to compile the jar files into .Net IL.
&lt;/p&gt;
&lt;p&gt;
This approach means that the same source code is used for both the Java world
and the .Net world, making future improvements visible to both  from a single source tree.
&lt;/p&gt;
&lt;p&gt;
I tried doing it for .Net on Windows with
&lt;a href="http://msdn.microsoft.com/vstudio/express/visualcsharp/"&gt;C# Express&lt;/a&gt;
and IKVM-0.24.0.1.&lt;/p&gt;

&lt;p&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Summary&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;SPARQL queries work.&lt;/p&gt;

&lt;p&gt;
Using Jena from C# works for small scale cases - lots of checking to do but
it should be a matter of verifying everything from the dependent libraries works properly.
&lt;/p&gt;

&lt;p&gt;
Some things aren't working but there are a few hotspots of trouble that, when
fixed, mean that the majority (may be all) of the Jena test suite will run. As
it is at the moment, quite a lot can be done including using the ARQ command line programs.
&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;The Conversion&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;The IKVM bytecode conversion route is my preferred choice because it means one source
codebase, not two.  When I tried this before, I got an early version of ARQ up and running.
But it wasn't complete; the first big block was the lack of &lt;code&gt;java.nio.charset&lt;/code&gt;
support in GNU Classpath.  Jena and ARQ have lots of tests of internationalization and
charsets.  That alone was enough to make it not worthwhile exploring further at the time.
&lt;/p&gt;
&lt;p&gt;
Now (Feb 2006) &lt;a href="http://www.gnu.org/software/classpath/"&gt;GNU Classpath&lt;/a&gt;
coverage is much better.  See the &lt;a
href="http://www.kaffe.org/%7Estuart/japi/htmlout/h-jdk14-classpath.html"&gt;
coverage of GNU Classpath compared to Java 1.4&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
The process is simple: run ikvmc on all the jars to get a library.  Ignore all
the warnings about missing stuff.  It's surprising what various libraries actually
reference - Log4j has references to a lot of log record transports. At the simplest:
&lt;/p&gt;
&lt;blockquote&gt;
&lt;pre&gt;ikvmc *.jar -out:XXX.dll -target:library&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;p&gt;I've now broken this in two DLLs: jena-libs.dll (all the jars except the jena ones)
and jena.dll (jena.jar, jenatest.jar, arq.,jar, iri.jar)
but that is just because I keep building the DDLs while testing.
&lt;/p&gt;
&lt;p&gt;
It takes a minute or so (less time than building jena.jar itself).
The result is two DLLs of about totaling 16M - the whole assembly is about
23M including the three IKVM DLLs.  Not small - but it works and it is simple to do.
&lt;/p&gt;
&lt;p&gt;
What's been tried: in-memory graphs, reading and writing &lt;a
href="http://www.dajobe.org/2004/01/turtle/"&gt;turtle&lt;/a&gt; files (but XML types literals broken)
and SPARQL queries.
&lt;/p&gt;
&lt;p&gt;
Jena bugs: (this is relative to CVS and so after Jena 2.3)
&lt;ul&gt;
&lt;li&gt;file:///c:/absolute was incorrectly turned into a windows filename.  Worked OK with
Sun's Java but not IKVM. Fixed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
GNU Classpath bugs:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;InputStreamReader(InputStream, Charset) is broken although the other two
constructors that allow the charset conversion to be explicitly
controlled do seem to work.  This can be worked around in Jena.
&lt;a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26220"&gt;Bugzilla Entry&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;Zero-width lookbehind regexs aren't implemented.  They are used by JJC's new IRI code.  
&lt;a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26221"&gt;Bugzilla Entry&lt;/a&gt;.
&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;ARQ Test Suite&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;As a rough comparision, I ran the ARQ test suite:&lt;/p&gt;

&lt;p&gt;BEfore any fixes, with Java 5 JVM:&lt;br/&gt;
Tests run: 1119,  Failures: 0,  Errors: 0
&lt;/p&gt;
&lt;p&gt;
Using ikvm as the JVM:&lt;br/&gt;
Tests run: 1119,  Failures: 32,  Errors: 17
&lt;/p&gt;
&lt;p&gt;
Converting to .Net:&lt;br/&gt;
Tests run: 1119,  Failures: 32,  Errors: 59
&lt;/p&gt;
&lt;p&gt;[20 Feb: JJC recoded around the lack of lookbehind and
now its down to 4 failures of which 3 are 
because GNUClasspath is just different to Sun's runtime]
&lt;/p&gt;

&lt;p&gt;
&lt;span style="font-weight: bold;font-size:130%;" &gt;Next&lt;/span&gt;
&lt;/p&gt;
&lt;p&gt;
Now it's work through the broken tests in the ARQ test suite to determine what's the cause as time permits.
&lt;/p&gt;
&lt;p&gt;
&lt;a 
href="http://www.gotdotnet.com/workspaces/workspace.aspx?id=ad7acff7-ab1e-4bcb-99c0-57ac5a3a9742"&gt;IronPython&lt;/a&gt; to Jena?
&lt;/p&gt;

&lt;p&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Updates&lt;/span&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Calling Jena from VB.Net works&lt;/li&gt;
&lt;li&gt;The GNU Classpath/InputStreamReader bug has been fixed&lt;/li&gt;
&lt;li&gt;The GNU Classpath/lookbehind bug had already been fixed but
    very recently so IKVM hasn't picked it up yet.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now 4 failures, 3 of which are corner case differences of URI resolution
in unusual cases.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/18002060-113968248786457683?l=seaborne.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://seaborne.blogspot.com/feeds/113968248786457683/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=18002060&amp;postID=113968248786457683' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/113968248786457683'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/18002060/posts/default/113968248786457683'/><link rel='alternate' type='text/html' href='http://seaborne.blogspot.com/2006/02/progress-with-jenanet.html' title='Progress with Jena.Net'/><author><name>AndyS</name><uri>http://www.blogger.com/profile/18033124086179105115</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry></feed>
