Log analysis with GoAccess
analytics nginx goaccessNow that we’re serving data from our Rasberry Pi, let’s see if anyone is actually consuming that data by analyzing our web server logs.
Getting GoAccess #
We’ll use GoAccess - Visual Web Log Analyzer to generate an html report from our nginx logs. There’s no official docker image for the Raspberry Pi, so we will need to build the image ourselves
git clone https://github.com/allinurl/goaccess.git
cd goaccess
docker build . -t allinurl/goaccess
Filtering the logs #
My nginx server is running as a docker container, so we can view the logs with this docker command:
docker logs blog-prod 2> /dev/null
I’ve customized my logs a bit so that the first three fields are the origin ip address, then the Cloudflare ip address, and finally the reverse proxy ip address (10.0.0.1
) . We can apply a filter to make sure that we analyze only traffic coming from the reverse proxy. This will exclude any testing where we make requests from the local network, for example.
docker logs blog-prod 2> /dev/null \
| awk '$1 ~ /[0-9]/ && $2 ~ /[0-9]/ && $3 ~ /10\.0\.0\.1/'
We can do further filtering to remove some common bots as well as our uptime monitoring:
docker logs blog-prod 2> /dev/null \
| awk ‘$1 ~ /[0-9]/ && $2 ~ /[0-9]/ && $3 ~ /10\.0\.0\.1/‘ \
| grep -v “UptimeRobot” \
| grep -v “Googlebot”
Finally, if we just want to look at page views from our tracking pixel, we can use this command:
docker logs blog-prod 2> /dev/null \
| awk ‘$1 ~ /[0-9]/ && $2 ~ /[0-9]/ && $3 ~ /10\.0\.0\.1/‘ \
| grep “interwebs.gif” \
| node ~/src/pixel-parser/index.js
Generating an HTML report #
We can pipe our logs through GoAccess to generate a nice HTML report from our log data. Based on the filters above, I’m going to generate three different reports:
- All traffic coming from the reverse proxy
- All traffic except bots
- Tracking pixel only
I’ve set up a script that can generate all three files:
#!/bin/bash
if [ -z "$DEPLOY_DIR" ]; then
echo "ERROR: DEPLOY_DIR not set"
exit 1
fi
docker logs blog-prod 2> /dev/null \
| awk ‘$1 ~ /[0-9]/ && $2 ~ /[0-9]/ && $3 ~ /10\.0\.0\.1/‘ \
| docker run \
--rm \
-i \
-e TZ=“America/New_York” \
allinurl/goaccess \
-d \
-o html \
--log-format COMBINED \
- > $DEPLOY_DIR/_site/goaccess-all-traffic.html
docker logs blog-prod 2> /dev/null \
| awk ‘$1 ~ /[0-9]/ && $2 ~ /[0-9]/ && $3 ~ /10\.0\.0\.1/‘ \
| grep -v “UptimeRobot” \
| grep -v “Googlebot” \
| docker run \
--rm \
-i \
-e TZ=“America/New_York” \
allinurl/goaccess \
-d \
-o html \
--ignore-crawlers \
--log-format COMBINED \
- > $DEPLOY_DIR/_site/goaccess-no-bots-traffic.html
docker logs blog-prod 2> /dev/null \
| awk ‘$1 ~ /[0-9]/ && $2 ~ /[0-9]/ && $3 ~ /10\.0\.0\.1/‘ \
| grep “interwebs.gif” \
| node ~/src/pixel-parser/index.js \
| docker run \
--rm \
-i \
-e TZ=“America/New_York” \
allinurl/goaccess \
-d \
-o html \
--ignore-crawlers \
--log-format COMBINED \
- > $DEPLOY_DIR/_site/goaccess-tracked-pages.html
Let’s add a cron job for these so they get refreshed periodically. Start your crontab editor with:
crontab -e
And add a job that runs every hour:
33 * * * * ~/generate-analytics-reports.sh
Serve the reports #
Finally, I’ve set up another very simple nginx container to serve the files. Here’s the default.conf
file:
server {
listen 80;
server_name localhost;
root /usr/share/nginx/html;
location / {
autoindex on;
}
}
And start the container with:
docker run \
-v $DEPLOY_DIR/_site:/usr/share/nginx/html:ro \
-v $DEPLOY_DIR/default.conf:/etc/nginx/conf.d/default.conf:ro \
-p 8082:80 \
--name analytics-webserver \
--restart always \
-d \
nginx
Now we can browse to http://interwebs.local:8082 and view each of the reports.