First things first, if your web hosting service provider doesn't allow you to download the raw log file, just go away. You need the raw log file to study and improve your web site performance.
What's A Log File?
Every time someone visits a page on your site, a record is made into the log file, which is saved on your server. You can find some interesting and useful information about the visitors in the log file.
Though log file formats vary, here I discuss the common elements.
Here's the contents of a single line of the log file from this site.
220.127.116.11 - - [03/Jul/2003:06:39:23 -0400] "GET / HTTP/1.1" 200 15549 "http://www.working-at-home-business.com" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; http://www.working-at-home-business.com)"
Let´s see what's inside one by one.
User IP Address
This is the IP address of the visitor to our site. It tells you where the visitor is from. If you do a reverse DNS look-up on this IP number at DNS Stuff the result is bbcache-9.singnet.com.sg which belongs to "Singapore Telecommunications Pte Ltd". You really can't go further than that to identify a particular person. Otherwise, the Internet would be too dangerous. ;)
Yeah, that visitor was myself.
The exact time of the visit. Combined with the IP address, it enables you to follow a particular visitor sequentially from page to page on your site. More on this later.
This is the number of hours from Greenwich Mean Time (GMT). So in our example the offset is 4 hours from GMT.
"GET / HTTP/1.1"
This is either GET or POST. Except for a few CGI programs, this will typically be GET. That is, get a web page or an image that goes on that page.
This line records a command from my own browser to GET a web page from the root directory (Notice the slash "/"after GET) using a protocol named HTTP/1.1. This is the index page of our web site.
"GET /web-promotion/index.shtml HTTP/1.1"
It records a request of this URL:
The next item tells whether the action was successful or not. Our example is a return code of 200, which means "Successful Loaded". You've probably got the dreaded 404 "File Not Found" error code when the web page you were trying to find wasn't at that URL, so these return codes aren't entirely new to you.
Other common return codes include:
400 - Bad Request 401- Authorization Required 403 - Forbidden 500 - Internal Server Error
This is the size of the file sent, in this case 15549 bytes.
This tells us the web page where the visitor came from. In our example http://www.working-at-home-business.com ,which is also run by us.
You will find another extremely important piece of information here: the keywords by which your visitors found you. For example:
It tells you someone found this page at Yahoo, using the keywords "web hosting singapore".
By studying referrer information, you will know exactly which search engine brings your how many visitors, what they were looking for when they found your site, which links partner is more valuable...then you will know how to spend your advertising dollars wisely.
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; http://www.working-at-home-business.com)"
The final field in the log file tells you what web browser and operating system the visitor is using.
Mozilla is a code name that indicates the browser is Netscape-compatible. In this case, the visitor was using IE6.0 on a Windows NT operating system.
Why my URL is at the end again? Just a little fun. I customized my browser a bit. You won't see it anywhere else unless I did visit your web site. ;)
Tracing A Visitor
Here comes the more interesting part. Lets take a closer look at the log file and see how a visitor passes through your site. I will be abbreviating the log file to simplify this for you.
03/Jul/2003:06:39:23 GET / 03/Jul/2003:06:39:52 GET /web-hosting/index.shtml 03/Jul/2003:06:40:36 GET /newsletter/index.shtml 03/Jul/2003:06:41:04 POST /cgi-bin/followup/auto_followup.pl 03/Jul/2003:06:41:05 GET /newsletter/subscribed.shtml 03/Jul/2003:06:41:27 GET /support/index.shtml 03/Jul/2003:06:41:38 GET /support/log.shtml
First, the visitor went to the homepage of Singapore Web Hosting, then web hosting section to find out more about web hosting package. And then looked at newsletter page and filled up the subscription form. Our CGI script processed the form and the visitor was redirected to "thank you for subscription" page. This visitor continued reading some articles in support section.
I´ve skipped all the requests for images.
Why should you analyze a visitor´s path? Because only when you do that, you begin to discover how a visitor uses your site: which door and from where she comes in, what interests her most, and where she leaves. Lots of small scientific observations will add up to an accurate picture of what a visitor actually does on your site.
That information is priceless if your goal is to optimize the experience, and lead your visitor to the most important parts of your site.