I thought many of you might enjoy a little programming challenge that I came across. I’ve long been a fan of reddit, and especially the programming and delphi sub reddits. reddit, who seem to finally be getting the support that they need, posted an interesting job ad about 8 months ago, which seemed like a nice little project.

The challenge, to summarise, is to devise an application which can search through very large log files for lines which match the time (the name of the challenge will give away the purpose for those familiar with the grep tool). The challenge focuses specifically on HAProxy logs and although they’re not hard to generate yourself they do provide a sample perl generator. The main behaviour of the application is that a user should be able to provide either a time or time range and optionally a log path (with a default) and have the lines within the scope of the log file returned:

$ tgrep 8:42:04 
[log lines with that precise timestamp] 
$ tgrep 10:01 
[log lines with timestamps between 10:01:00 and 10:01:59] 
$ tgrep 23:59-0:03 
[log lines between 23:59:00 and 0:03:59]

For the purpose of this challenge, the log files can be assumed to have been rotated every 24 hours and you will therefore only need to worry about the time of a line and not a date changeover.

A successful application in the strict interpretation of this challenge would also need to work on Linux and would therefore likely need to be compiled with Freepascal.

What I like about this challenge is that there are many levels of solution which you could dive into. Clearly, as the challenge states, you are unlikely to achieve any real speed when searching through log files of many gigabytes if you perform a linear search. The size of the files involved will also cause problems any solution which isn’t careful about memory consumption.

For my first iteration I initially opted for a straightforward interpolative search, which turned out to be much more buggy than I’d originally thought however luckily, I had written my tests first even though I had to resort to a form of integration testing instead of using mock objects (Nick Hodges would not be happy!). It’s a shame that none of the current Mocking libraries appear to be compatible with Delphi 2009 so I thought I’d leave the mocks until a later iteration. My initial iteration works and although I haven’t performance tested it fully yet, I fully expect it to pale in comparison with some of the excellent tgrep entries made available via github. Of particular note is EvansBee’s C version, which in the few tests that I ran was very quick.

For my second iteration, I expect that it can be optimised in real-world conditions by taking account of the peaks and troughs in traffic that I would imagine reddit get during certain intervals of the day.

If anyone feels like they might like to give the tgrep challenge a go with some absurdly clever way of tackling this problem in delphi/fpc then feel free to comment/get in contact. It might be fun to put them all up on github so that we can all compare and contrast. If someone was really clever then they might even be able to hookup some kind of clever super project which builds a bunch of git submodules and tests them all.


Tags: , ,


  1. Dorin Duminica on the 26th October 2011 remarked #

    Interpolation search is the way to go IF AND ONLY IF the data is in ascending or descending order.
    If the files are huge, then one could read in chunks of a few mega bytes, say 20-25mb(because that’s almost certain that it will load within 1 sec).

  2. Consuela Vejar on the 13th March 2012 remarked #

    Amazing Post! Just bookmarked your site for further visits

Leave a Comment