[Mulgara-general] Mulgara size limits

Paul Gearon gearon at ieee.org
Tue Jul 1 10:18:34 PDT 2008


On Tue, Jul 1, 2008 at 9:30 AM, Seaborne, Andy <andy.seaborne at hp.com> wrote:
>
>> [snip]
>>
>> The out-of-memory exception is being thrown in the Jena ARP parser,
>> which we use for parsing RDF/XML.  Looking quickly at the Jena source, I
>> see that ARP keeps a bunch of stuff in memory, namely blank node
>> mappings, so it doesn't surprise me that it runs out of memory for large
>> files.
>
> If it's the check for illegal reuse of bNodes ids, then may be it's an old version of ARP? Nowadays it issues a warning about being unable to track illegal reuse of bNode ids across the whole file and stops that checking while continuing parsing.  If you're using an old version, then if it's using an old version of Xerces, that might also be a factor.

I'm pretty sure that no one has updated this in a long time, so it's
pretty much guaranteed to be an old version of ARP. Xerces got updated
about 2 years ago, but I can't recall if it's happened again since
then.

I didn't realize that the ARP code was keeping blank node mappings in
memory. Whoever implemented the content handler with ARP must not have
known this, as the content handler maintains it's own mappings. These
mappings are file-based, so they shouldn't have memory restrictions,
though I can't comment on their speed, as I don't know how well they
cache in memory.

Alex's comment about RIO reminds me.... wasn't this parser supposed to
replace the ARP-based one?

Paul


More information about the Mulgara-general mailing list