In my Web Analytics class, we’re beginning to analyze Apache log files to extract Analytics data. Today, I pulled down a raw access log from this site to see what I could learn. I also have AWStats going to build reports for server access. As I’ve been digging through my access log, I’ve noticed that comment spammers make up a large portion of my server access.
I have found that comment spambots will hit a page on my blog, then scrape the page for the comments form, and then post spam comments to the form target. From AWStats, close to 50% of the access of my site are from Operating Systems that are unknown. This leads me to believe that about 50% of my access log data is pollution from spambots.