Feb 162013

Recently a customer sent me an XML file which was 3.5 gigabyte in size. I had to parse this file and met some new challenges.

The first one was that I did not have any tool to display a file that large. All text editors balked at it and the few viewers (for Windows) that claimed they could handle large files I found on the Internet were slow like dogs or very difficult to use (does nobody teach those kids user interface design nowadays?). I made do with the Linux less command for a while but always putting the file(s) on the server, ssh’ing into it just to display it was an annoyance in itself.

So I ended up rolling my own. It’s of course written in Delphi and needed only a few lines of code. It relies heavily on my TdzVirtualStringGrid component to only keep that part of the file in memory that is currently being displayed. It immediately displays the first few lines of the file and in the background it reads through the file to find CR/LF characters and creates a list of Int64 values with the stream offsets of every single line in the file. While indexing you can only scroll down up to the point that has been indexed but that’s the only restriction.  For the above mentioned 3.5 gigabyte file indexing takes around 30 seconds on my computer.

Apart from displaying the file, it does nothing. There is no search function and it even does not display line numbers. But I thought it might be useful to somebody else, so I put it into my dzlib svn repository on sourceforge. You can find it in the subdirectory tools. so after I received some patches from Daniela Osterhagen for it I put up a sourceforge project for it. There is also a downloadable executable now.

Leave a Reply

%d bloggers like this: