[Mulgara-general] firefox bug and server config

Paul Gearon gearon at ieee.org
Wed Sep 9 01:59:12 UTC 2009


On Sun, Sep 6, 2009 at 11:18 AM, Gregg Reynolds <dev at mobileink.com> wrote:
>
> On Sun, Sep 6, 2009 at 7:34 AM, Paul Gearon <gearon at ieee.org> wrote:
>>
>> There were several headers that I needed to set or change, and could
>> find documentation on. In the end I ran the system in a debugger and
>
> Let me know if I can lighten the load by researching stuff.

With this issue, for sure. In general, I just needed to get a basic
framework up to run web apps. Theoretically, the same app should look
the same for any deployment, and we're trying to take advantage of
that for both Jetty and Tomcat.

>> >> That said, it can be hard to find things in documentation at the best
>> >> of times. I'd love to set it up to make it easy for people to find
>> >> what they need, but I really don't know how. It's not feasible to just
>>
>> I have a feeling that a lot of info should be duplicated all over the
>> place, just so people can find it when it's relevant to them.
>> Unfortunately, doing "redundant" work in something that is already not
>> getting enough time and effort is probably doomed to failure.  :-)
>
> DITA is intended to attack that problem.  The idea is you write up topics
> (which are intended to be self-contained) and then design various "topic
> maps" that organize them in various ways.  So you can have multiple views
> while minimizing duplication of effort.  No promises, but I'm trying to find
> time to do a minimal DITA version of some Mulgara documentation, just to see
> if it's useful or even feasible.

I'm willing to work with it if it looks like it will help us. Let me know.

>> If you need changes to the Jetty config, then just let me know and I'm
>> happy to make sure they make it into the standard distribution. (The
>> general rule is that once you've submitted 2 non-trivial successful
>> patches, then you can have SVN access).
>
> I'll do my best, but I'm afraid I'm having a bit of trouble grokking Mulgara
> configuration.  If you have a minute to point me in the right direction I'd
> appreciate it.  Specifically:
>   It looks like the Jetty configuration is embedded in the source code
> of src/jar/server/java/org/mulgara/server/HttpServices.java; is that
> correct?

Yes, I tried to isolate it into there for several reasons. Until
recently, it was being set up in EmbeddedMulgaraServer, and therefore
it was being mixed in with everything else. The idea of the
HttpServices class is to isolate the references to the HTTP server
(which is Jetty for now). This means that if you don't have an HTTP
server configured, then it won't be needed. In other words, by
disabling HTTP in the configuration file, you can run Mulgara without
any Jetty JARs at all.

Other than simple enabling/disabling of the HTTP server, a number of
the Jetty configuration options appear in the MulgaraConfig object.
This object is created using Castor from an XSD description. The
source code of this object tends to get dropped during the build, but
you'll find the description that it's built from in
conf/mulgara-embedded.xsd. The default configuration file that gets
loaded is conf/mulgara-config.xml, so if you look in here you should
see everything that gets configured.


> At least that's about the only place I see Jetty mentioned, and I
> can't find anything that looks like a jetty.xml config file.

Nope, it's all in the main configuration file.

Incidentally, if you need to override anything in the default config
file, you can just use that portion of the XML that you need. Until
recently, you needed an entire configuration file of your own (so
you'd normally just copy the main file and edit what you wanted
changed), but now you can just load a configuration snippet (though I
think you still need the entire Jetty section).

> So if I wanted
> to check the Origin header of an incoming request and set the
> Access-Control-Allow-Origin header of the response, that's where I'd do it?

No. HttpServices sets up the servers (one for read/write access on
port 8080, and a second one for read-only access on port 8081), and
loads webapps into them. Unless you're talking about some global
server setting, then you want to look for headers and write responses
in the individual web applications.

If you have a look at the method HttpServices.getContextStarters()
you'll see the services that get loaded (I should be loading these
from a configuration file, but I did it in a method for expediency at
the time). You'll see that the SPARQL endpoint it loaded from
org.mulgara.protocol.http.SparqlServlet. Similarly, the TQL service is
loaded from org.mulgara.protocol.http.TqlServlet. Each of these
servlets are mostly implemented in an abstract class called
ProtocolServlet, so you'll most likely need to make any changes there.

>  To tell you the truth it doesn't look like it, but I'm not clear on the
> relation of the Mulgara to Servlet to Webserver.  Plus, in case it's not
> obvious, it's been a looong time since I was knee-deep in Java.

When you start a default distribution of Mulgara, it will do a few things:

1. Create a Mulgara Database object. This is the object that you
connect to for read or write operations, etc. It creates and modifies
all the files on disk.
2. Start an RMI server, and use it to provide access to the Database object.
3. Create an HTTP server (Jetty).
  3a. Create a set of web applications in the HTTP server. The
Database is provided as a parameter to each of these applications.
4. Sit there until a STOP notification is received.

That's pretty much it.  :-)

So the servlets simply act as clients to the database, and hence are
pretty much independent. The main difference is that they will usually
be in the same JVM as the database (though this may not happen in a
Tomcat deployment - and they are supposed to manage that kind of
deployment).

> Second question:  the character encoding issue is a showstopper for me.
>  (See tickets 197-202; 197 can be closed since I broke it in two.)  If you
> can give me any pointers to the code involved I'll at least try to figure
> out what needs to be done.

org.mulgara.protocol.StreamedSparqlJSONAnswer.jsonEscape(String in)

It's pretty simple code. I just follow the simple rules given at
http://www.json.org/. However, I think that the problem may be that
the data isn't a unicode string.

First of all, I think that we may be getting back characters the same
way the Jena parsers gives them to us, which is to say, a sequence of
char that actually holds a sequence of unicode bytes. (While
apparently perverse, it means you can move the data around in strings,
and the strings are printable if they're in ascii). I don't know this
for sure, but I suspect it is, as I've run into this before.

For other output formats we use the NTriples encoding recommendation,
and the code for that is at
org.jrdf.util.EscapeUtil.escapeUTF8(String). This code checks the bit
patters of each "character" (really, a byte stored in a character),
and works out the unicode character that it needs. If it's too large
for a 16bit char, then it gets the "codepoint" and writes it as
\Uxxxxxxxx. Otherwise, if the character was in 2 or 3 bytes, it gets
the remainder.

This should mostly work for JSON, with a couple of exceptions...
1. It doesn't escape "/", form-feed, backspace, or control characters.
2. The JSON spec does not describe any way to handle unicode
characters that do not fit into 16 bits.

The first part is easy, but the second part is hairy. You *could* just
ignore it, as the odds of these characters actually appearing in real
life are really low, and most clients probably wouldn't know how to
handle it. But that could lead to security issues if someone wanted to
corrupt a system by using these characters. I would probably extend
the spec to use the \Uxxxxxxxx format for these oversized characters
instead, much as NTriples is doing.

> BTW, do you receive notifications when new tickets are created?

No.

After you asked this, I created an RSS feed, but then I just had a
list of tickets in my RSS reader. Email is more likely to get my
attention, but I don't know how to set the server up for this
(fortunately, you've been emailing me, so I've been getting the
message).  :-)

Regards,
Paul Gearon



More information about the Mulgara-general mailing list