Server access analysis – A day of HTTP logs from the United States Environmental Protection Agency Server
technology image html
technology image css
technology image bootstrap
technology image javascript
technology image nodejs
technology image chartjs
technology image json

The assignment was to import the access logs for the EPA from 1995, restructure the data and provide a graphical analysis of the data. Because the description is currently not available under http://ita.ee.lbl.gov/html/contrib/EPA-HTTP.html, the raw file epa-http.txt is enclosed and downloadable here. The trace contains a day’s worth of all HTTP request to the EPA Webserver located at Research Triangle Park, NC. I’d like to acknowledge Dr. Laura Bottomley (laurab@ee.duke.edu) of the Duke University for the freely distribution of the logs, which had been used and analyzed in this project.

The logs were collected from 23:53:25 EDT on Tuesday, August 29, 1995, through 23:53:07 on Wednesday, August 30, 1995, a total of 24 hours. There were 47,748 total requests, 46,014 GET requests, 1,622 POST requests, 107 HEAD requests, and 6 invalid requests. Timestamps have one-second precision.

In the first step I wrote a script that imports the access log file and creates a new file that holds the log data, cleans it from uncommon characters and structure it as a JSON-Array. Secondly, I programmed the HTML and JavaScript files to read the JSON-File and render the following analysis graphically as charts:

  • Requests per minute over the entire time span
  • Distribution of HTTP methods (GET, POST, HEAD, ...)
  • Distribution of HTTP answer codes (200, 404, 302, ...)
  • Distribution of the size of the answer of all requests with code 200 and size < 1000B

Aside from that, it was essentially for me to pick the right chart to deliver a comprehensible explanation to the data. To set the focus on the charts I implemented a descent and responsive User Interface.