Each graph describing something contains Freebase URLs to be explored. What we want is the ability to load data into our local store while some query is running, enabling the dataset to be enlarged as the query makes choices about how to proceed.
This is similar to cwm's log:semantics. http://ww.w3.org/2000/10/swap/doc/Reach
In SPARQL, the dataset is fixed. No good if you want to write a graph-walking process without some glue in your favourite programming language. In one way, it's scripting for the web but in a special way. It's not a sequence of queries and updates; it's changing the collection of graphs, expanding the RDF dataset known to the application.
Query 1 : See what's in the graph
Let's first look at what's available at the example URL. That does not require anything special: it's just a FROM clause (which in ARQ will content-negotiate for RDF; if you use a web browser you will see an HTML page):
PREFIX fb: <http://rdf.freebase.com/ns/> SELECT * FROM fb:en.blade_runner { ?s ?p ?o }
Hmm - 294 triples.
Query 2 : Look for interesting properties
PREFIX fb: <http://rdf.freebase.com/ns/> SELECT DISTINCT ?p FROM fb:en.blade_runner { ?s ?p ?o }
62 distinct properties used. fb:film.film.starring looks interesting.
Query 3 : Follow the links
As an experimental feature, consider a new SPARQL keyword "FETCH
" which takes a URL, or
a variable bound to a URL by the time that part of the query is reached, and
fetches the graph at that location.
Now we fetch the documents at each of the URLs that are objects of the blade runner, film.film.starring triples.
FETCH
loads the graph and places it in the dataset as a named graph, the name
being the URL is fetched it from. We use GRAPH
to access the loaded graph. Done
this way, triples from different sources are kept separately which might be
important in deciding what sources to believe.
This also shows a critical limitation: just placing in a named graph is a basic requirement for deciding what to believe but really there ought to be a lot more metadata about the graph, including when it was read, possibly why it was read (how we got here in the query) etc etc. But we are not an agent system so we will note this and move on.
By poking around with GRAPH ?personUUID { ?s ?p ?o}
(60 triples) the property film.performance.actor looks hopeful.
PREFIX fb: <http://rdf.freebase.com/ns/> SELECT ?actor FROM fb:en.blade_runner { fb:en.blade_runner fb:film.film.starring ?personUUID FETCH ?personUUID GRAPH ?personUUID { ?personUUID fb:film.performance.actor ?actor } }
12 results.
-------------------------------------------- | actor | ============================================ | fb:en.james_hong | | fb:en.brion_james | | fb:en.edward_james_olmos | | fb:en.joanna_cassidy | | fb:en.william_sanderson | | fb:en.rutger_hauer | | fb:authority.netflix.role.20000077 | | fb:guid.9202a8c04000641f80000000054cbccc | | fb:en.sean_young | | fb:en.joe_turkel | | fb:en.harrison_ford | | fb:en.daryl_hannah | --------------------------------------------
and more URLs to follow.
Looking in the next graph, there is fb:type.object.name
so let's
guess and use that. But each time we have chosen a property, we didn't
have to guess, we can follow that property URL itself:
PREFIX fb: <http://rdf.freebase.com/ns/> SELECT * FROM fb:type.object.name { ?s ?p ?o }
but it's easier to read the description in HTML (and freebase is link following internally to build the page).
Query 3 : The names of actors in Blade Runner
So a query to find the names of actors in "Blade Runner" is:
PREFIX fb: <http://rdf.freebase.com/ns/> SELECT ?actor ?name FROM fb:en.blade_runner { fb:en.blade_runner fb:film.film.starring ?personUUID FETCH ?personUUID GRAPH ?personUUID { ?personUUID fb:film.performance.actor ?actor } FETCH ?actor GRAPH ?actor { ?actor fb:type.object.name ?name } } ORDER BY ?actor
which gives:
------------------------------------------------------------------- | actor | name | =================================================================== | fb:authority.netflix.role.20000077 | "M. Emmet Walsh" | | fb:authority.netflix.role.20000077 | "M・エメット・ウォルシュ" | | fb:en.brion_james | "Brion James" | | fb:en.daryl_hannah | "Daryl Hannah" | | fb:en.daryl_hannah | "Ханна, Дэрил" | | fb:en.daryl_hannah | "דריל האנה" | | fb:en.daryl_hannah | "ダリル・ハンナ" | | fb:en.edward_james_olmos | "Edward James Olmos" | | fb:en.harrison_ford | "Harrison Ford" | | fb:en.harrison_ford | "Форд Гаррісон" | | fb:en.harrison_ford | "Форд, Харрисон" | | fb:en.harrison_ford | "Харисон Форд" | | fb:en.harrison_ford | "Харисън Форд" | | fb:en.harrison_ford | "האריסון פורד" | | fb:en.harrison_ford | "ハリソン・フォード" | | fb:en.harrison_ford | "哈里森·福特" | | fb:en.harrison_ford | "해리슨 포드" | | fb:en.james_hong | "James Hong" | | fb:en.joanna_cassidy | "Joanna Cassidy" | | fb:en.joe_turkel | "Joe Turkel" | | fb:en.rutger_hauer | "Rutger Hauer" | | fb:en.rutger_hauer | "Хауэр, Рутгер" | | fb:en.rutger_hauer | "ルトガー・ハウアー" | | fb:en.rutger_hauer | "魯格·豪爾" | | fb:en.sean_young | "Sean Young" | | fb:en.sean_young | "Шон Йънг" | | fb:en.sean_young | "Янг, Шон" | | fb:en.sean_young | "ショーン・ヤング" | | fb:en.william_sanderson | "William Sanderson" | | fb:guid.9202a8c04000641f80000000054cbccc | "Morgan Paull" | -------------------------------------------------------------------
We are left with a question: why use (extended) SPARQL? If you're doing it once, then a web browser is easier. After all, I used one to choose the properties to follow.
But with a query you can send it to someone else for them to reuse your knowledge, you can rerun it to look for changes, you can generalise and let the computer do some brute force search to find things that would take you, the human, a long time.
No comments:
Post a Comment