Tiplet: monitor realtime web site traffic on a Linux server with tmux and tail

I run a VPS which has numerous sites on it. Whilst I was trying to pin down the root cause of sporadic hard lockups and runaway memory usage, I settled on a somewhat inefficient (yet very handy) line of code which I run inside a tmux session over a PuTTY SSH connection (two other panes run iftop and watch --interval=0.1 iostat -m for realtime disk I/O).

An aside: tmux is like screen on speed, way more extensible and SO MUCH EASIER TO USE, I highly recommend you give it a try if you're a commandline warrior. There's some highly useful tutorials to help you get up to speed - google "tmux tutorial", Hawk Host's two-parter has some good stuff in it.

To accomplish this I'm taking advantage of the fact that DirectAdmin (which by default provides a base of Apache 2, MySQL and PHP 5) stores its httpd access logs in a common folder: /var/log/httpd/domains/<virtualhost>.log</a>. I'm combining the tail command with grep's egrep functionality (grep -e) and some pattern matching. It's not perfect: I have to occasionally Ctrl+C and restart the command as it stalls out, but it does everything I need.

Here's my command:

tail /var/log/httpd/domains/*.log -f -n 50 | grep -e "GET / HTTP/1\|GET /2011/\|GET /2010/\|GET /2009/\|GET /2012/\|.php HTTP/1\|.html HTTP/1\|.mp3 HTTP/1"

To break it down:

  1. tail = invoke tail
  2. /var/log/httpd/domains/*.log = read all files ending in *.log from the path /var/log/httpd/domains/ (a relative path, or none at all, could be used if you invoke the command in a closer folder)
  3. -f = declares it to refresh the screen live as files are updated
  4. -n 50 = read the last 50 lines from each file (I recommend you set a large scrollback buffer if you want to specify more!)
  5. | = pipe symbol, used to append another command - in this case, to perform further processing on tail's raw output
  6. grep = invoke grep
  7. -e = behave like "egrep"
  8. the long command (actually several, separated with escaped pipe characters) inside inverted commas = only display lines containing any one of these matching strings

Notes: When including multiple desired string matches with grep -e, you need to escape the pipe symbol as it behaves differently used inside regular expressions (which is what grep and egrep use). To do this, you put a backslash directly before the pipe symbol, \| -- if you don't, it'll be ignored, or that match string will be prepended to the one following it (derp).

This accomplishes exactly what I need on this box, I can see requests to site roots, requests to WordPress-based sites running with rewritten URLs (see the year-based URLs) plus any PHP, HTML or MP3 files. You can expand upon this to your heart's content but the default should work quite nicely. You *will* have to do further work on the string if you want to match sites with rewritten URLs if there's no indicator in the URL to show it's a page of content, but that's beyond the scope of this wee article. Hope you find it useful!

---
Related reading:

Dayid's screen and tmux cheat sheet (Chris: VERY USEFUL!)
http://www.dayid.org/os/notes/tm.html

Hawk Host's tmux article: Part 1 | Part 2 (Chris: VERY USEFUL!)

screen and tmux compared (with keys)
http://www.dayid.org/os/notes/tm.html

Hawk Host TMUX tutorial: Part 1 | Part 2
Mutelight: Practical tmux
http://mutelight.org/articles/practical-tmux

Googly-oogly...
http://www.google.co.uk/search?q=connect+to+defunct+tmux+session

SU: Why do I have multiple tmux processes?
http://superuser.com/questions/259154/why-do-i-have-multiple-tmux-processes

Leave a Reply

Your email address will not be published. Required fields are marked *

Notify me of followup comments via email. You can also subscribe without commenting.

I