09 December 2010

Performance benchmarks for the TDB loader (version 2)

CAVEAT

There are "Lies, damned lies, and statistics" but worse are probably performance measurements done by someone else. The real test is what does it mean for any given application and is performance "fit for purpose". Database-related performance measurements are particular murky. The shape of the data matters, the usage made of the data matters, all in ways that can wildly affect whether a system is for for purpose.

Treat these figures with care - they are given to compare the TDB bulker (to version 0.8.7) loader and the new one (version 0.8.8 and later). Even then, the new bulk loader is new, so it is subject to tweaking and tuning but hopefully just to improve performance, not worsen it.

See also

http://esw.w3.org/RdfStoreBenchmarking.

Summary

The new bulk loader is faster by x2 or more depending on the characteristics of the data. As loads can take hours, this saving is very useful. It produces smaller databases and the databases are as good as or better in terms of performance than the ones produced by the current bulk loader.

Setup

The tests were run on a small local server, not tuned or provisioned for database work, just a machine that happened to be easily accessible.

  • 8GB RAM
  • 4 core Intel i5 760 @2.8Ghz
  • Ubuntu 10.10 - ext4 filing system
  • Disk: WD 2 TB - SATA-300 7200 rpm and buffer Size 64 MB
  • Java version Sun/Oracle JDK 1.6.0_22 64-Bit Server VM

BSBM

The BSBM published results from Nov 2009.

The figures here are produced using a modified version of the BSBM tools set used for version 2 of BSBM. The modifications are to run the tests on a local database, not over HTTP. The code is available from github. See also this article.

BSBM - Loader performance

BSBM dataset Triples Loader 1 Rate Loader 2 Rate
50k 50,057 3s 18,011 TPS 7s 7,151 TPS
250k 250,030 8s 31,702 TPS 11s 22,730 TPS
1m 1,000,313 26s 38,956 TPS 27s 37,049 TPS
5m 5,000,339 121s 41,298 TPS 112s 44,646 TPS
25m 25,000,250 666s 37,561 TPS 586s 42,663 TPS
100m 100,000,112 8,584s 11,650 TPS 3,141s 31,837 TPS
200m 200,031,413 30,348s 6,591 TPS 8,309s 24,074 TPS
350m 350,550,000 83,232s 4,212 TPS 21,146s 16,578 TPS

BSBM - Database sizes

Database Size/loader1 Size/loader2
50k 10MB 7.2MB
250k 49MB 35MB
1m 198MB 137MB
5m 996MB 680MB
25m 4.9GB 3.3GB
100m 20GB 13GB
200m 39GB 26GB
350m 67GB 45GB

BSBM - Query Performance

Numbers are "query mix per hour"; larger numbers are better. The BSBM performance engine was run with 100 warmups and 100 timing runs over local databases.

Loader used50k250k1m5m25m100m200m350m
Loader 1102389.187527.458441.65854.71798.4673.0410.7250.0
Loader 2106920.186726.162240.711384.53477.9797.1425.8259.2

What this does show is that for a narrow range of database sizes around 5m to 25m, the databases produced by loader2 are faster. This happens because the majority ogf the working set of databases due to loader1 didn't fit mostly in-memory but those produced by loader2 do.

COINS

COINS is the Combined Online Information System from the UK Treasury. It's a real-wolrd database that has been converted to RDF by my colleague, Ian - see Description of the conversion to RDF done by Ian for data.gov.uk.

General information about COINS.

COINS is all named graphs.

COINS - Loader Performance

COINS dataset Quads Loader 1 Rate Loader 2 Rate
417,792,897 26,425s 15,811 TPS 17,057s 24,494 TPS

COINS - Database sizes

Size/loader1 Size/loader2
152GB 77GB

LUBM

LUBM information

LUBM isn't a very representative benchmark for RDF and linked data applications - it is design more for testing inference. But there is some details of various systems published using this benchmark. To check the new loader on this data, I ran loads for a couple of the larger generated. These are the 1000 and 5000 datasets, with inference applied during data creation. The 5000 dataset, just under a billion triples, was only run through the new loader.

LUBM - Loader Performance

LUBM dataset Triples Loader 1 Rate Loader 2 Rate
1000-inf 190,792,744 7,106s 26,849 TPS 3,965s 48,119 TPS
5000-inf 953,287,749 N/A N/A 86,644s 11,002 TPS

LUBM - Database sizes

Database sizes:

Dataset Size/loader1 Size/loader2
1000-inf 25GB 16GB
5000-inf N/A 80GB

TDB bulk loader - version 2

This article could be subtitled called "Good I/O and Bad I/O". By arranging to use good I/O, the new TDB loader achieves faster loading rates despite writing more data to disk. "Good I/O" is file operations that occurs in a buffered and streaming fashion. "Bad I/O" is file operations that cause the disk to jump the heads about randomly or work in small units of disk transfer.

The new TDB loader "loader2" is a standalone program that bulk loads data into a new TDB database. It does not support incremental loading, and may destroy existing data. It has only been tested on Linux; it should run on Windows with Cygwin but what the performance will be is hard to tell.

Figures demonstrating the loader in action for various large datasets are in a separate blog entry. It is faster than the current loader for datasets over about 1 million triples and comes into it's own above 100 million triples.

Like the current bulk loader ("loader1"), loader2 can load triple and quad RDF formats, and from gzipped files. It runs fastest from N-triples or N-Quads because the parser is fastest, and low overhead, for these formats.

The loader is a shell script that coordinates the various phases. It's available in the TDB development code repository in bin/tdbloader2 and the current 0.8.8 snapshot build.

Loader2 is based on the observation that the speed of loader1 can drop sharply as the memory mapped files fill up RAM (the "can" is because it does not always happen; slightly weird). This fall off is more than one would expect simply by having to use some disk and sometimes the rate of loader1 becomes erratic. This could be due to the OS and the management of memory mapped files but the effect is that the secondary index creation can become rather slow. loader1 tends to do "bad I/O" - as the caches fill up, blocks are written back in what to the disk looks like a random order causing the disk heads to jump round.

Copying from the primary index to a secondary index involves a sort because TDB uses B+trees for it's triple and quad indexing. A B+Tree keeps its records in sorted order and each index is different orders.

Loader1 is much faster than simply loading all indexes at once because in that case there is some much RAM being used for caching of parts of all the indexes. Better is to do one index at a time, using the RAM for caching one index at a time.

Loader2 similarly has an data loading phase and an index creation phase.

The first phase is to build the node table and write out the data for index building. Loader2 takes the stream of triples and quads from the parser and writes out the RDF terms (IRI, Literal, blank node) into the internal node table. It also writes out text files of tuples of NodeId (the internal 64 bit number used to identify each RDF term. This is "good I/O" - the writes of the tuples files are buffered up and the files are written append-only. This phase is a Java program, which exits after the node table and working files have been written.

The next phase is to produce the indexes, including the primary index. Unlike loader1, loader2 does not write the primary index during node loading. Experimentation showed it was quicker to do it separately despite needing more I/O. This is slightly strange.

To build indexes, loader2 uses the B+Tree rebuidler and that requires the data in index-sorted order. Index rebuilding is a sort followed by B+tree building. The sort is done by Unix sort. Unix sort is very easy to use and it smoothly scales from a few lines to gigabytes of data. Having written the tuple data out as text files in the first phase (and fixed width hex numbers at that - quite wasteful) Unix sort can do a text sort on the files. Despite that meaning lots of I/O, it's good I/O and the sort program really knows how to best manage temporary files.

For each index, a Unix sort is done to get a temporary file of tuple data in the right sort order. The B+Tree rebuilder is called with this file as the stream of sorted data it needs to create an index.

There are still opportunities to tune the new loader and to see if the output of the sorts being piped directly into the rebuilder is better or worse than the two step approach using temporary file used at the moment. Using different disks for different temporary files should also help.

The index building phase is parallelisable. Because I/O and memory usage are the bottlenecks, not CPU cycles, the crossover point for this to become effective might be quite high.

To find out whether loader2 is better than loader1, I've run a number of tests. Load and query tests with the Berlin SPARQL Benchmark (2009 version), a load test on the RDF version of COINS (UK Treasury Combined Online Information System - about 420 million quads and it's real data) and a load test using the Lehigh University Benchmark with some inferencing. Details, figures and tables in the next article.

03 December 2010

Repacking B+Trees

TDB uses B+trees for it's triple and quad indexing.

The indexes hold 3 or 4 NodeIds, where a NodeId is a fixed length 64 bit unique number for each RDF term in the database. Numbers, dates and times are encoded directly into the 64 bits where possible, otherwise the NodeId refers to the location in a separate NodeId to RDF term table like all other types,including IRIs.

The B+Trees have a number of leaf blocks, each of which holds only records (key, value pairs, except there's no "value" part in a triple index - just the key of S,P and O in various orders). TDB threads these blocks together so that a scan does not need to touch the rest of the tree - scans happen when you look up, say S?? for known subject and unknown property and object. The scan returns all the triples with a particular S. Counting all the triples only touches the leaves of the B+Tree, not the branches.

B+Trees provide performant indexing over a wide range of memory situations, ranging from very little caching of disk structures in memory, through to being able to cache substantial portions of the tree.

The TDB B+Trees have a number of block storage layers; an in-JVM block caching for use on 32 bit JVMs, memory mapped files, for 64 bit JVMs, and an in-memory RAM-disk for testing. The in-memory RAM disk is not efficient but it is a very good simulation of a disk - it really does copy the blocks used by a client when written to another area so there is no possibility of updating blocks through references held by the client after the block has been written to "disk".

However, one disadvantage can be that the index isn't very well packed. The B+Trees guarantee that each block is at least 50% full. In practice, the blocks are 60-70% full for indexes POS and OSP. But a worse case can arise happens when inserting into the SPO index because data typically arrives with all the triples for one subject, then all the triples for another subject, meaning the data is nearly sorted. While this makes the processing faster, it makes the resulting B+Tree about 50%-60% packed.

Packing density matters because it influences how much of the tree is cached in a fixed amount of computer memory. If it's 50% packed, then it's only 50% efficient in the cache.

There are various ways to improve on this (compress blocks, B#Trees, and many more besides - B-tree variations are very extensively studied data-structures).

I have been working on a B+Tree repacking programme that takes an existing B+Tree and produces a maximumally packed B+Trees. The database is then smaller on disk and the in-memory caches are more efficiently used. The trees produces are legal B+Trees, and have a packing density of close to 100%. Rebuilding indexes is fast and scales linearly.

The Algorithm

Normally, B+Trees grow at the root. A B+tree is the same depth everywhere in the tree and the tree only gets deeper if the root node of the tree is split and a new root is created pointing to down the two blocks formed by splitting the old root. This algorithm, building a tree from a stream of records, grows the tree from the leaves towards the root. While the algorithm is running there isn't a legal tree - it's only when the algorithm finishes, does a legal B+Tree emerge.

All the data of a B+tree resides in the leaves - the branches above tell you which leaf block to look in (this is the key difference between B-Trees and B+Trees). The first stage of repacking takes a stream of records (key and value) from the initial tree. This stream will be in sorted order because it's being read out of a B+Tree and for a TDB B+tree, it's a scan tracing the threading of the leaf blocks together. In other words, it's not memory intensive.

In the first stage, new leaf blocks are produces, one at a time. A block is filled completely, a new block allocated, the threading pointers completed and the full block written out. In addition, the block number and highest key in the block are emitted. The leaf block is not touched again.

The exception is the last two blocks of the leaf layer. A B+Tree must have blocks at least 50% full to be a legal tree. Although the TDB B+Tree code can cope with blocks that are smaller than the B+tree guarantee, it's neater to rebalance the last two blocks in the case the last block is below the minimum size. Because the second-to-last block is completely full, it's always possible to rebalance in just two blocks.

Phase two takes as input the stream of block number and highest key from the level below and builds branch nodes for the B+Tree pointing, by block number, to the blocks produced in the phase before. When a block is finished, the block can be written out and a block number and split key emitted. This split key isn't the highest key in the block - it's the highest key of the entire sub-tree at that point but this the key passed. A B+tree branch node has N block pointers and N-1 keys and the split key is the last key from making the full block, and is the Nth key from below.

Once again, the last two blocks are rebalanced to maintain then B+Tree invariant of all blocks being at least half full. For large trees, there are quite a few blocks, so the rebalance of just two of them is insignificant. For small trees, it not really worth repacking the tree - block caching at runtime hides any advantages there might be.

The second phase is repeated applied to the block number and split key stream from the layer below until a layer in the tree is only one block (it can't be zero blocks). This single block is the new root block. The third phase is to write out the B+Tree details to disk and put the root block somewhere where it can be found when the B+Tree is reopened.

Consequences

The repacking algorithm produces B+Trees that are the approaching half the size of the original trees. For a large dataset, that's several gigabytes.

The repacked trees perform a bit faster than trees formed by normal use except in one case where they are faster. If the tree is small, the majority fits in the RAM caches, then repacking means less RAM is used but the speed is much the same (in fact as few percent slower, hard to measure but less than 5%, presumably because there is a difference ratio of tree decent and in-block binary search being done by the CPU. This may be no more than a RAM cache hierarchy effect).

However, if the tree was large, and repacked now fits mostly in memory, the repacked trees are faster. As the indexes for an RDF dataset grows much large than the cacheable space, then this effect slowly declines. Some figures to show this are in preparation.

The biggest benefit however, is not directly the speed of access or the reduced disk space. It's the fact here is a fast and linear growth way to build a B+Tree from a stream of sorted records. It's much faster than simply using the regular insertion into the B+Tree.

This is part of the new bulk loader for TDB. It uses external sorting to produce the input to index creation using this B+Tree repacking algorithm. The new bulk loader can save hours on large data loads.

28 August 2010

Migrating from the SPARQL Update submission language to the emerging SPARQL 1.1. Update standard

SPARQL 1.1 Update is work-in-progress by the SPARQL Working Group but the general design and language is reasonably stable. There is also the W3C submission SPARQL Update from July 2008. The language are similar in style but the details of the grammars differ. So how to migrate from the syntax used in the submission to the upcoming SPARQL recommendation for a SPARQL Update language?

One way is to provide both languages behind a common API, with the application indicating which language to use. This maximises compatibility because if the submission is the chosen language, the parser for the submission language will be used. But the application has to be changed to move between the languages and conversion of update scripts has to be done for each script, so probably it's a "big bang" change over. The two languages are very close - is it possible to have a single language that covers both languages? Then the application can mix usages and when an update request is printed it can be printed in the soon-to-be standard language, helping people see how the language has changed.

It turns out that most, but no all, the submission language can be incorporated into the grammar for the emerging standard. The cases not covered don't seem to be ones likely to be widely used although it would be good to know if they are.

  • CREATE, CLEAR, LOAD, DROP are covered.
  • INSERT DATA, DELETE DATA on the default graph covered or working on one a named graph is covered but not on more than one graph at once.
  • An extra grammar rule for MODIFY is supported, again working on the default graph or one named graph. but with only a single, optional GRAPH <uri>.
  • The old style INSERT { :s :p :o }, DELETE { :s :p :o }, that is, insert or delete some data using just the INSERT or DELETE keyword, without DATA, leads to ambiguity in the combined grammar. These forms are not supported in the combined language. In fact, these forms pre-date the DATA forms in the submission language.

The ability to work on only one named graph needs a little explanation. In the combined grammar, the INTO or FROM is used to set the WITH part of an update. There can be at most one WITH. In the submission,

INSERT INTO <g1> <g2> <g3> { ... } WHERE { ... }

is legal. In terms of language, this could be incorporated into the extended language but it introduces a capability not present in the upcoming working group language and it can't be written out again without repeating the operation, once for each named graph. Operating on a single named graph, or the default graph, is covered by the standard.

For old style INSERT or DELETE of data, conversion can be done by adding in the word DATA to the operation or adding WHERE {} to the update operation. Both these conversions yield something that is legal and the same under the submission language so the conversation can be done and retain the use of old software.

In summary: The accepted forms of the submission language are:

  INSERT [INTO <uri>] {...} WHERE {...}
  DELETE [FROM <uri>] {...} WHERE {...}
  INSERT DATA [INTO <uri>] {...}
  DELETE DATA [FROM <uri>] {...}

By using an extended grammar, the application can even mix syntax of the submission on SPARQL Update and SPARQL 1.1 Update in a single request or, indeed, single operation. When printed the output can be in the equivalent SPARQL 1.1 Syntax.

ARQ (currently, the development snapshot) includes a command line SPARQL 1.1 Update extended parser, "arq.uparse". arq.uparse reads the extended syntax and prints the equivalent strict SPARQL 1.1 Update form. It can be used to translate from the submission language to W3C standards language. More on practical details: jena-dev/message/45040.

Key points from the extended Grammar: The working group is not planning on including this published SPARQL 1.1 Update grammar.

UpdateUnit  :=  Prologue Update <EOF>

Update  :=  ( Update1 )+

# As for SPARQL 1.1 Update with addition of "ModifyOld"
Update1 :=  ( Load | Clear | Drop | Create |
              InsertData | DeleteData | DeleteWhere |
              Modify | ModifyOld )
            ( <SEMICOLON> )?

Load    :=  <LOAD> IRIref ( <INTO> ( <GRAPH> )? IRIref )?

Clear   :=  <CLEAR> ( <SILENT> )? GraphRefAll

Drop    :=  <DROP> ( <SILENT> )? GraphRefAll

Create  :=  <CREATE> ( <SILENT> )? GraphRef

InsertData  :=  <INSERT_DATA> OptionalIntoTarget QuadPattern

DeleteData  :=  <DELETE_DATA> OptionalFromTarget QuadData

DeleteWhere :=  <DELETE_WHERE> QuadPattern

Modify  :=  ( <WITH> IRIref )?
            ( DeleteClause ( InsertClause )? | InsertClause )
            ( UsingClause )*
            <WHERE> GroupGraphPattern

# The MODIFY form from the submission
ModifyOld   :=  <MODIFY> ( IRIref )?
                ( DeleteClause )?
                ( InsertClause )?
                <WHERE> GroupGraphPattern

DeleteClause    :=  <DELETE> OptionalFromTarget QuadPattern

InsertClause    :=  <INSERT> OptionalIntoTarget QuadPattern

# Optional INTO: wraps the QuadPattern with a GRAPH
OptionalIntoTarget  :=  ( ( <INTO> )? IRIref )?

# Optional FROM; wraps the QuadPattern with a GRAPH
OptionalFromTarget  :=  ( ( <FROM> )? IRIref )?

UsingClause :=  <USING> ( IRIref | <NAMED> IRIref )

30 July 2010

Moving to Epimorphics

I'm moving to Epimorphics, starting there early next month. Epimorphics is now located in Portishead (as of last Tuesday).

As before, I will still be able to work on Jena, ARQ and TDB and I also get to continue participating in the W3C SPARQL working group, now as an Invited Expert. The working group is making good progress on it's chosen list of features, and now it's just a "small" matter of doing the core work and getting out the Last Call documents to the community.

More exciting times.

17 July 2010

Ubuntu on a Samsung N210

I have Ubuntu 10.04 working on a Samsung N210, running Thunderbird, Firefox as well as all my Java development systems. It may not be a fast machine but it's very convenient. The process is now easy, easier than some older material (for 9.10 and very early 10.04) on the web might suggest.

When first turned on, the machine installed Windows 7 starter. I let this finish even though I didn't want it so I could install Ubuntu 10.04 along side Windows in case it didn't work. Once I was happy it would work, I repartitioned the disk (with gparted) to create a single partition, deleting Windows and the restore partition, then reinstalled.

First, build a USB drive with the install on. To get the machine to boot fro this I had to:

  • As the machine boots, keep F2 pressed to go into the BIOS.
  • Make sure the machine will boot from a USB pendrive.
  • Reboot with USB and install Ubuntu Netbook Remix

You have to press F2 very early to get into the BIOS configuration screens. The boot through the BIOS is very fast so don't wait for machine to put the Samsung flash screen up.

You can reset the BIOS to not boot from USB if you want to at this stage, or later.

At this point the wireless does not work. Don't panic; plug in an Ethernet cable and update the system.

sudo apt-get update
sudo apt-get upgrade
sudo reboot

and now the wireless works. There's quite a lot of advice on the web about this but it now seems that there is no need for any custom software - looks like the main Ubuntu repositories have a working version of the system.

To get the function keys working I followed the advice in https://bugs.launchpad.net/ubuntu/+bug/574250.

The missing function keys are due to the fact that Samsung N150/N210/N220 are missing from the udev rules:

/lib/udev/rules.d/95-keymap.rules
/lib/udev/rules.d/95-keyboard-force-release.rules

adding "|*N150/N210/N220*" to the product part of the rules for Samsung in BOTH files, will enable the Fn-up and Fn-down keys. The new product section will look like:

ENV{DMI_VENDOR}=="[sS][aA][mM][sS][uU][nN][gG]*", ATTR{[dmi/id]product_name}=="*NC10*|*NC20*|*N130*|*SP55S*|*SQ45S70S*|*SX60P*|*SX22S*|*SX30S*|*R59P/R60P/R61P*|*SR70S/SR71S*|*Q210*|*Q310*|*X05*|*P560*|*R560*|*N150/N210/N220*"

Now, you can map these keys to any program setting the backlight

and then install some Samsung tools - you need to add the repository to the package manager which you can do graphically or as:

sudo add-apt-repository ppa:voria/ppa
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install samsung-tools samsung-backlight
sudo reboot

at which point the N210 works a treat.

Now - remove all the Windows stickers on the machine, front and back.

If you are looking for software to try, this blog is a good place to start.

02 June 2010

Standardising RDF Syntaxes

One area of interest at the RDF Next Steps Workshop is other RDF-related syntaxes, ones that are not RDF/XML. RDF/XML is the standard syntax; N-Triples is defined as part of the RDF test suite but not formally as a syntax on the same level as RDF/XML; there is RDFa for embedding in XHTML.

RDF/XML is not easy to read as RDF. Turtle appeals because it more clearly shows the triple structure of the data. N-Quads is a proposal to extend RDF file format to named graphs and TriG is a Turtle-inspired named graph syntax. There is TriX but I've never come across that in the wild.

Using XML had several advantages, such as comprehensive character set support, neutrality of format and reuse of parsers. However, it's complicated in it's entirety, even after using an XML parser and it is quite expensive to parse, making parsing large (and not some large) files a significant cost. Because it can't, practically, be processed by XSLT there are nowadays few advantages.

All the non-XML formats, which are much easier to read and process, would be good to standardise but they are not without the need for sorting out some details. Details matter when you're dealing with anything over a trivial amount of data and when's it's millions of triples, it's just a friction point to get the data cleaned up if there is disagreement between information publisher and information consumer.

Turtle

Turtle takes the approach of using UTF-8 as the character set, rather than relying on character set control like XML. Given that nowadays UTF-8 support is well understood and widely available, the internationalization issues of different scripts are best dealt with that way. Parsers are both simple to write and fast. (The tricks needed to get Java to parser fast would be a subject for a separate discussion.)

As Turtle is the more mature of the possible syntaxes, it is also the best worked out. One issue I see is the migration from a one-standard-syntax world to a two-standard-syntax world and it's not without its practical problems. What if system A speaks RDF/XML only, and system B speaks only Turtle? How long will it take for uses of content negotiation take to catch up? Going from V-nothing to V1 of a system (which is where we are now) is usually quicker than going from V1 to V2 as the need to upgrade is much less. If it ain't broke why change?

Turtle can write graphs that RDF/XML can't encode. If the property can't be split into namespace and local name, then RDF/XML can't represent it. An XML qname must have a local part of at least one alphabetic character. This isn't common but these details arise and cause problems (that is, costs) when exchanging data at scale.

What would be useful would be a set of language tokens to build all sorts of languages, like rule languages but at the moment there some unnecessary restrictions in Turtle on prefixed name (Turtle calls them qnames but they are not exactly XML qnames).

Turtle disallows:

employee:1234

because the local part starts with a digit. In data converted from existing (non-RDF) data this is a nuisance, and one that caused SPARQL to allow it, based on community feedback.

But there are other forms that can be useful that are not allowed (and aren't in SPARQL):

ex:xyz#abc
ex:xyz/abc
ex:xyz?parm=value

The last one might be a bit extreme but the first two or just using the prefix to tidy up long IRIs. Partial alignment with XML qnames makes no sense in Turtle. Extending the range of characters to include /, # and maybe a few others, makes prefixed names more useful. Issues just like this lead to the CURIE syntax.

While these URIs can be written in Turtle, it needs the long form, with <...>, and the only way to abbreviate is via the base IRI, but you can only have one base URI. It's a workaround really that gets ugly when the advantage of Turtle is that it is readable. Extending the range of characters in the local part does not invalidate old data; it does create friction in interoperability so we have one last chance to sort this out if Turtle is to be standardised.

N-Quads

<s> <p> <o> .
<s> <p> <o> <g> .

What could be simpler? N-Quads is N-Triples with an optional 4th field to give the graph name (or context - it wasn't designed specifically for named graphs, but let's just consider IRIs in the 4th field, not blank nodes or literals which the syntax allows).

But TriG puts the graph name before the triples, while N-Quads puts it after. Maybe N-Quads should be like TriG so that TriG can make N-Quads a subset. Parsing this modified N-Quads only takes buffing of the tokens on the line and counting to 3 or 4 to determine if it's a triple or a quad. Making TriG more flexible, at the cost of the slightly less intuitive graph name first, in what is basically a dump format, seems to me to be a good trade-off.

Blank nodes labels need to be clarified - is the scope the graph or the document? Both are workable. I'd choose scope-to-the-document, if only to avoid the confusion of two identical labels referring to to different bnodes, and it's occasionally useful to say that a bnode in one graph really is the same as another when using it as a transfer syntax (for example, when one graph is a subgraph of another). TriG has the same issue but the use of nested forms for graphs makes scoped-graph more reasonable (except that graphs can be split over different {} blocks). Doing the same in N-Quads and TriG is important, and my preference is document-scoped labels.

TriG

TriG is a Turtle-like syntax for named graphs. It is useful for writing down RDF datasets.

It has some quirks though. Turtle is not a subset of TriG because the default graph needs to be wrapped in {} but the prefixes need to be outside the {}. The default graph needs to be given in a single block, but named graphs can be fragmented (that was just an oversight in the spec). It would be helpful to allow the unnamed graph be specificed as Turtle and similarly if an N-Quads file were legal TriG.

TriG allows the N3-ish form:

<g> = { ... } .

I've seen some confusion about this form in the data.gov.uk data. The addition "=" and ".", which are optional, cause confusion and at least one parser does not accept them as it wasn't expected.

In N3, = is a synonym for owl:sameAs but the relationship isn't likely to be owl:sameAs, read as N3, it's more likely to be log:semantics. Now I like the uniformity of the N3 data model, with graph literals (formulae) because of the simplicity and completeness it introduces but it's not RDF, it's an extension and it breaks all RDF-only systems.

If <g> is the IRI of a graph document, it would be more like the N3:

 <g> log:semantics { ... } .

or

<g> log:semantics ?v .
?v owlSameAs { ... } .

Avoiding the variability of syntax, which brings no benefit, is better. Drop the optional adornment.

Summary

None of these issues are roadblocks; they are just details that need sorting out to move from the current de facto formats to specifications. When exchanging data between systems that are not built together, details matter.