[Expat-discuss] DOM parser, expat vs. lbxml?

Discussion:

AFish at GoldenGate.com ()

2003-03-05 17:44:06 UTC

We need a DOM parser in C that will compile on any platform. So far, the
only C xml parsers I have seen are expat and libxml. The only DOM parser
build on top of expat I have seen is 'SCEW' the simple C expat wrapper
(http://www.nongnu.org/scew/).

Questions:
1. Are there other expat wrappers or examples which provide DOM-like xml
tree traversal?
2. Has anyone done a side-by-side libxml vs. expat comparison? Is there any
reason we should roll our own DOM parser on top of expat instead of using
libxml?

Karl Waclawek

2003-03-05 17:54:13 UTC

Permalink

Post by AFish at GoldenGate.com ()
We need a DOM parser in C that will compile on any platform. So far, the
only C xml parsers I have seen are expat and libxml. The only DOM parser
build on top of expat I have seen is 'SCEW' the simple C expat wrapper
(http://www.nongnu.org/scew/).
1. Are there other expat wrappers or examples which provide DOM-like xml
tree traversal?

Not that I know of.

Post by AFish at GoldenGate.com ()
2. Has anyone done a side-by-side libxml vs. expat comparison? Is there any
reason we should roll our own DOM parser on top of expat instead of using
libxml?

Why would you not want to use SCEW instead of rolling your own?

About libxml vs. Expat: I have never compared them, but it seems
that Expat is pretty good in the areas of speed and memory use,
as well as being quite compliant. However, Expat does not validate.

Karl

AFish at GoldenGate.com ()

2003-03-05 19:41:00 UTC

Permalink

Post by Karl Waclawek
Why would you not want to use SCEW instead of rolling your own?

This is a good point, I tried SCEW yesterday, and it was pretty good. I had
a little trouble compiling on Win32 but the issues were minimal.

rolf at pointsman.de ()

2003-03-05 20:29:31 UTC

Permalink

Post by AFish at GoldenGate.com ()
We need a DOM parser in C that will compile on any platform. So far, the
only C xml parsers I have seen are expat and libxml. The only DOM parser

Don't forget rxp, a good, fast, compliant and optional validating
parser http://www.cogsci.ed.ac.uk/~richard/rxp.html (don't get afraid
about the artless home page, it's a good product). If C++ is also OK
for you, there's of course also xerces-c++ (http://xml.apache.org).

Well, and for completeness sakes, don't forget msxml (the XML parser
out of the evil empire). I'm not a fan of MS for various reasons, but
their XML parser (and there XSLT engine) isn't bad.

Post by AFish at GoldenGate.com ()
build on top of expat I have seen is 'SCEW' the simple C expat wrapper
(http://www.nongnu.org/scew/).
1. Are there other expat wrappers or examples which provide DOM-like xml
tree traversal?

Sablotron (http://www.gingerall.com/charlie/ga/xml/p_sab.xml). From
the home page:

" Sablotron is a fast, compact and portable XML toolkit implementing
XSLT 1.0, DOM Level2 and XPath 1.0. [...] Sablotron uses James Clark's
expat XML parser."

You better shouldn't buy in there claim, that they have a "fast" XSLT
processor (this claim is somewhat ridiculous). Though, Sablotron
itself is written in C++.

There are for sure more DOM implementations based on expat around,
than only one. For example, there's an Tcl extension (I'm one of the
maintainers), which implements DOM on top of expat (and also XPath and
XSLT) (http://www.tdom.org). The DOM building parts are completetly in
C, so it may worth a look.

Post by AFish at GoldenGate.com ()
2. Has anyone done a side-by-side libxml vs. expat comparison? Is there any
reason we should roll our own DOM parser on top of expat instead of using
libxml?

There could be said a lot - your question is a bit vague about your
needs.

Expat does not validate (although it does read, on demand, external
entities). If a well-formdness parser is OK for you, expat is
definitely somewhat faster - but since both parsers are really fast,
this may only be of interest, if you aim for maximum
speed. Additionally, the time, needed to build a DOM like structure in
memory (which typically needs a lot of mallocs for the node
structures) isn't negligible, so the overall speed depends not only on
the raw parser speed, but also on the quality of the DOM building
code.

Another factor, which may be important (depending on the size of your
XML data) is, that DOM trees typically need _a lot_ of memory. This
depends of course on how much markup you have in your document (and
how much 'indentation' fluff you have in your document) but it's
normal, that you need 3 to 5 times the file size of memory for the
DOM tree. Although the libxml DOM trees need notable lesser memory
than every Java DOM implementaion, I know, it isn't the slimmest
implementation, avaliable. For example, the above mentioned tDOM
implentation has a notable lesser overhead (which is important for me,
because I've to handle really large product data lists in XML).

DOM and DOM are not the same. Do you mean DOM 1, 2 or 3? What about
entities? Must you preserve parsed entities? DOM alone will probably
make you somewhat unhappy, in short time. Navigation within the tree
can get tedious, if you don't have support for at least XPath (libxml
provides this). But I better stop now.

rolf

Post by AFish at GoldenGate.com ()
_______________________________________________
Expat-discuss mailing list
http://mail.libexpat.org/mailman/listinfo/expat-discuss

Michael B. Allen

2003-03-05 20:56:18 UTC

Permalink

On Tue, 4 Mar 2003 11:23:34 -0800

Post by AFish at GoldenGate.com ()
We need a DOM parser in C

DOM is not a parser. Expat is a parser. The DOM is a tree of nodes in
memory. Frequently they do come with some kind of module to load and
store from and to XML however in which case it would *use* an XML
parser.

Post by AFish at GoldenGate.com ()
that will compile on any platform. So far, the

"any platform"? You really should be a little more spcecific. You might
say POSIX or ANSI platform but "any" just isn't possible.

Post by AFish at GoldenGate.com ()
only C xml parsers I have seen are expat and libxml. The only DOM parser

There are several XML parsers in C and particularly in C++. Use 'xml
parser c' on google. Xerces C++, IBM's xml4c, and Oracle XML for C are
three that I can think of.

Post by AFish at GoldenGate.com ()
build on top of expat I have seen is 'SCEW' the simple C expat wrapper
(http://www.nongnu.org/scew/).

This isn't a real DOM though.

Post by AFish at GoldenGate.com ()
1. Are there other expat wrappers or examples which provide DOM-like xml
tree traversal?

There are lot's of these I suspect. And you could write your own in 20
mintues. Here's one:

http://www.eskimo.com/~miallen/libmba/dl/docs/ref/domnode.html

Post by AFish at GoldenGate.com ()
2. Has anyone done a side-by-side libxml vs. expat comparison? Is there any

I have never really used libxml. I believe it requires glib. I would
be willing to bet expat would be quite a bit faster and much much more
effecient though.

Post by AFish at GoldenGate.com ()
reason we should roll our own DOM parser on top of expat instead of using
libxml?

There are a couple of other DOM implementations. If you need a real
DOM rather than a simple DOM-like interface like those mentioned above
there's is DOMC (by yours truely):

http://www.eskimo.com/~miallen/domc/

Incedentally I just finished testing 0.7 but I have to run through the
portability tests and create the various packages so it will take me
another week or so. I have already compiled it on Windows NT and have
a working Win32 Makefile for MSVC (I'm not a Linux zealot!).

Mike

--
A program should be written to model the concepts of the task it
performs rather than the physical world or a process because this
maximizes the potential for it to be applied to tasks that are
conceptually similar and, more important, to tasks that have not
yet been conceived.