Last Friday at Beer O’Clock, we saw some awesome visualization of data done by Andrew Caudwell, a colleague of mine. I talked to him for a bit later in the evening and learned that he did not only create Gource with which you can visualize activity on git for example, but alos Logstalgia. With Logstalgia you can visualize website access logs.
That got me excited because for some of our busy servers it would be great to show the actual activity. As I don’t have access to any of our servers myself, I asked another colleague to give me all access logs for one of our sites. I thought I can just plug the log files in and let Logstalgia do its magic. Afraid not.
The log file lines looked like the following and apparently, Logstalgia doesn’t like the part in bold.
81.144.138.34 p: 4580 t: 0 – – [08/Jul/2012:06:25:14 +1200] “GET /group/view.php?id=5734 HTTP/1.1” 200 348 “-” “Wotbox/2.01 (+http://www.wotbox.com/bot/)”
The bad thing is that the values after p and t are different for each line. Now I could go through each line in the access log files (some are over 200 MB large text files), or do it a bit smarter. My text editor Geany can parse POSIX regex, but regex is a book with 1,000 seals to me. Chris Cormack, another colleague of mine whom I can ask anything and always get an answer and learn from him, provided me with the syntax for the POSIX regex and also explained the syntax which I understood much better than on a page on the Internet.
However, the programmer in him said: “I could write you a Perl one liner. 2 secs.” The result was:
cat logfile | perl -e ‘while ($inp=<STDIN>){$inp=~ s/p: \d+ t: \d+//; print $inp;}’ > newlogfile
The advantage of using Chris’ script is that it only takes a second to find all instances of the expression that needs replacing and take them out of the file. It would take minutes if I opened the files in an editor and used the Find & Replace functionality.
Update: The script can even be shorter. That’s what happens when 3 Perl developers are in an IRC channel. 🙂 Chris Hall, yet another colleague, shortened it to:
cat logfile | perl -pe ‘ s/p: \d+ t: \d+//’ > newlogfile
Now all I need to find is an interesting bit in the access log or a day and then I’m ready to create a visualization. Stay tuned for the result.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Even shorter!
perl -i .bak -pe ‘s/p: \d+ t: \d+//’
(-i .bak means “change the file in place, but create a backup with a .bak extension first”)
Sorry…Forgot the input file.
perl -i .bak -pe ‘s/p: \d+ t: \d+//’ logfile
Thank you, Michael.
Here’s another handy one-liner for you:
perl -e ‘length q caller vec and print chr oct ord q qx eq and print chr ord q ref or and print chr ord q or no and print chr ord q else and print chr ord qq q q and print chr ord q tie gt and print chr ord qw q sin q and print chr ord q q eq and print chr ord qw q sin q and print chr ord q sin s and print chr ord q cmp lc and print chr ord q split s and print chr ord qw q lc q and print chr ord q ne sin and print chr hex length q q bless localtime ref q and print chr hex chr ord uc q each package’
I wouldn’t have expected anything less from you, Grant. Thanks for your Thomas Mann one liner. 🙂