Docker monitoring with cAdvisor
monitoring raspberry-pi cloud docker cadvisor prometheus grafanaIn the last post we set up monitoring at the machine level, but we don’t get a good picture of what is happening inside docker. Let’s set up cAdvisor to get stats about each of the containers running in docker.
Start cAdvisor on the pi and in the cloud #
Google provides a prebuilt image that we can run in the cloud. We can pull it directly from Google’s container registry. Run these commands on the cloud server:
cat << EOF > docker-run-cadvisor.sh
#!/bin/bash
VERSION=v0.35.0 # use the latest release version from https://github.com/google/cadvisor/releases
docker run \
-v /:/rootfs:ro \
-v /var/run:/var/run:ro \
-v /sys:/sys:ro \
-v /var/lib/docker/:/var/lib/docker:ro \
-v /dev/disk/:/dev/disk:ro \
-p 10.0.0.1:9101:8080 \
--name cadvisor \
--restart always \
-d \
gcr.io/google-containers/cadvisor:$VERSION
EOF
chmod +x docker-run-cadvisor.sh
./docker-run-cadvisor.sh
Google doesn’t (yet?) provide an image for raspberry pi, but there are some examples of how to build for raspberry pi on docker hub. Build one yourself, or If you’re feeling lucky, just grab one of those pre-built containers:
cat << EOF > docker-run-cadvisor.sh
#!/bin/bash
docker run \
-v /:/rootfs:ro \
-v /var/run:/var/run:ro \
-v /sys:/sys:ro \
-v /var/lib/docker/:/var/lib/docker:ro \
-v /dev/disk/:/dev/disk:ro \
-p 9101:8080 \
--name cadvisor \
--network metrics \
--restart always \
-d \
zcube/cadvisor:latest
EOF
chmod +x docker-run-cadvisor.sh
./docker-run-cadvisor.sh
Update Prometheus config #
We can add two new jobs to the prometheus config to tell it to pull metrics from our cAdvisor instances:
cat << EOF >> prometheus.yaml
- job_name: 'iwbz00_docker'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'cloud_docker'
static_configs:
- targets: ['10.0.0.1:9101']
EOF
# restart prometheus
docker restart prometheus
Add a grafana dashboard #
We can use this as a starting point: Docker monitoring dashboard for Grafana | Grafana Labs and added a variable named $job
with the following query:
label_values(job)
Then update each of the panels to filter by job, for example:
rate(container_cpu_user_seconds_total{image!="",job="$job"}[5m]) * 100