[Mulgara-general] Mulgara size limits
Chuck Borromeo
cborromeo3 at yahoo.com
Tue Jul 1 07:42:48 PDT 2008
Thanks everyone for the input. I'll try to answer the followup responses:
1. (Ronald's question) The file size that is causing a problem is around 98 MB. I did not count the number of triples in the file (I can if you think it is important information.) The directory contains RDF/XML files from 10 MB-110 MB. The parser was able to load files less than 97 MB.
2. (David's suggestion) I tried the brute force approach (it was my first instinct). I ran it with -Xmx 4096m. That didn't help.
I'll try Alex's suggestion of switching parsers. This is most likely the cause since the parser is throwing the OutOfMemory error. Thanks for the great instructions Alex.
I'll let you know what happens.
Chuck
--- On Tue, 7/1/08, Seaborne, Andy <andy.seaborne at hp.com> wrote:
> From: Seaborne, Andy <andy.seaborne at hp.com>
> Subject: RE: [Mulgara-general] Mulgara size limits
> To: "Mulgara General" <mulgara-general at mulgara.org>, "cborromeo3 at yahoo.com" <cborromeo3 at yahoo.com>
> Date: Tuesday, July 1, 2008, 10:30 AM
> > -----Original Message-----
> > From: mulgara-general-bounces at mulgara.org
> [mailto:mulgara-general-
> > bounces at mulgara.org] On Behalf Of Alex Hall
> > Sent: 1 July 2008 15:06
> > To: cborromeo3 at yahoo.com; Mulgara General
> > Subject: Re: [Mulgara-general] Mulgara size limits
> >
>
>
> > [snip]
> >
> > The out-of-memory exception is being thrown in the
> Jena ARP parser,
> > which we use for parsing RDF/XML. Looking quickly at
> the Jena source, I
> > see that ARP keeps a bunch of stuff in memory, namely
> blank node
> > mappings, so it doesn't surprise me that it runs
> out of memory for large
> > files.
>
> If it's the check for illegal reuse of bNodes ids, then
> may be it's an old version of ARP? Nowadays it issues a
> warning about being unable to track illegal reuse of bNode
> ids across the whole file and stops that checking while
> continuing parsing. If you're using an old version,
> then if it's using an old version of Xerces, that might
> also be a factor.
>
> Andy
More information about the Mulgara-general
mailing list