So I used xml.dom.minidom.parse on a 18MB XML file. The damn thing ate 900MB of RAM. I sort of expected this kind of disaster, but 50 fucking times more memory?
Now, I know jack shit about the in-memory representation this thing uses, but holy fucking shit, you could have an individual struct with 10 pointers for each character and it'd still be smaller (this is on 32 bit btw).
Try another parser from the hundreds floating around the internet? Make your own? Or just give up and just let it use the RAM, it's not like you need it for anything, right?
>>2
The RAM usage is not a problem in this particular case, I'm not asking for solutions. I just think it's disgusting.
Name:
Anonymous2009-11-04 14:54
Try a SAX parser? (Get your minds out of the gutter, /prog/; that's SAX, not SEXPR.) It'll require complicated mutable state, but I've recently had to switch from the enterprise javax.xml.parsers.DocumentBuilderFactory to a SAX parser because of not having enough heap space to parse and load data from a 291 MB XML file.
Name:
Anonymous2009-11-04 15:02
PARSE MY ANUS
Name:
Anonymous2009-11-04 15:09
Your problem is that you're trying to represent the whole XML file in memory at once. You should try lazily consuming it, so that only what you need is being parsed and represented at a time, and older stuff gets garbage collected. I'm not sure which parser would be best for this, since I don't have much experience with them.
Name:
Anonymous2009-11-04 15:12
The problem is XML is an enterprise faggot concept that only enterprise faggots would use. Therefore, the parses are written by enterprise faggots, who can't write effecient code. If a real programmer wrote an XML parses, it would be aewsome, but unfortunately real programmers never use XML.
Name:
Anonymous2009-11-04 15:27
>>7
What about when a real programmer needs to scrape some web pages and uses an XML parser-based HTML parser like lxml?
>>1
That DOM parser was designed for smaller data, so it is less efficient but provides more information. Check out a more efficient parser if you want less, but you probably should be using a SAX parser for that data anyways.
>>7 If a real programmer wrote an XML parses, it would be aewsome, but unfortunately real programmers never use XML.
It's been done, once. http://search.cpan.org/~mirod/XML-Twig-3.32/Twig.pm
XML would be totally tolerable if other languages caught on. But then, so would a lot of things.