[Mulgara-dev] Update Patterns
Andrae Muys
andrae at netymon.com
Wed Dec 13 17:14:25 CST 2006
On 14/12/2006, at 1:19 AM, Chris Wilper wrote:
> Hi Andrae,
>
> > What sort of insert/deletes are people doing?
> > Are deletes/inserts normally paired in a single transaction?
> > How many statements are you insert/deleting in a
> > single update? Can they be categorised?
> > If so what is the frequency of the different categories?
> Initially, adding many batches of around fifty to 100 triples at a
> time. Most (say 75%) of the triples represent literal properties.
> Of those, probably a third are datatyped. The majority
> (say 75%) of our datatyped literals are xsd:dateTimes.
> As for URIs used in triples, an off the cuff guess is that 75%
> of them are distinct.
Roughly how many is 'many' - 1000, 10000, 1000000 ?
and over what sort of timeframe? seconds, minutes, hours?
> Update operations are smaller: we usually need to update only
> 5-20 triples at a time, and accomplish that via a series of deletes
> and adds as a single transaction.
That was the size I expected updates to be - again what sort of
volumes are you using, and how many TPS?
> > What is the 'shape' of the data you insert?
> > (ie. many mostly independent sub-graphs describing different
> > instances; or fewer instances with lots of interconnections and
> > object-reference properties?)
>
> Mostly independent sub-graphs, with diameter 2, consisting
> of a total of 50-100 triples each. Note that there are definitely
> connections between the sub-graphs, they are just relatively
> few.
> > Is any significant % of your literals replicated?
> > What % of the data are Blank-nodes?
>
> 0% are BNodes, thankfully. We don't do triplestore-to-triplestore
> replication right now...but BNodes would appear to complicate the
> problem.
It does depend. The planned replication for mulgara is done as a
dual-space transfer, so bnode-bnode replication is easy for us.
OTOH, I don't think it's possible without the special access we have
to the low-level bnode representations and the ability to control
that to permit transfer between graphs - so I agree, user-level
replication of bnodes is going to be nasty.
> > What is the average length of a URI?
>
> Average? Probably 60-70 characters.
I'm working on developing a persistent external-memory trie
datastructure for this sort of data. So I'm very interested in
knowing how much of that length is shared prefix? (as tries are
really good at storing shared prefixes :)
> > What is the average length of a Literal?
>
> About 50 characters, I would guess.
mmm. I am wondering if it might not be worth compressing the
individual literals in the store-layer. But ultimately the right
answer there has to do with the access patterns for data - if we
don't have sufficient locality given the access patterns, then we
will tend to pay 1 IO per lookup anyway. If we do get clustering in
the string pool, then compression may improve the cache utilisation.
I suppose I'll just have to measure it ;)
Andrae
--
Andrae Muys
andrae at netymon.com
Principal Mulgara Consultant
Netymon Pty Ltd
More information about the Mulgara-dev
mailing list