Analyze Apache Logs With R
I am trying to turn a new leaf and learn more about statistics. In order to demonstrate my new abilities I wanted to share with you how to analyze your Apache web logs using R.
Download R
I am going to be using R because it is free, open source and has a large community backing it. Download the latest version of R from http://cran.r-project.org.
Get Your Apache Log File
Log into your web server and download your access file(s) from Apache to your local computer or wherever you have R installed. I recommend merging your Apache log files together in order to increase the quality of the information you want to extract from your web logs. Merge your virtual host files if you serve CSS and Images from separate virtual hosts.
Parse Apache Log Files With R
I am using a standard Ubuntu Apache 2 configuration. Lets first examine the access log file.
access_log <- read.table(file="C:\\Users\\windoze\\Documents\\R\\data\\other_vhosts_access.log") access_log[1,] # Display the different vectors in the access_log dataframe
This is the easiest way to parse the log file into a data frame for analysis.
R Bar Chart Of Apache HTTP Codes
Here is a nice visual break down of the HTTP codes your application is serving.table(access_log[,8]) # Gives a nice text break down of HTTP codes served. barplot(table(access_log[,8])) # Gives a nice bar plot visual of the HTTP codes served.Later,
I will demonstrate how to extract more information from your access log such number of unique users visiting your site and when is the busiest time of week for your web site..Advertisement
