Documentation / Graphite

Graphite

What is Graphite #

Graphite store numeric time-series data and you can use that to store the metrics sitespeed.io collects. We provide an easy integration and you can use our pre-made Docker container and our dashboard setup or use your current Graphite setup.

Before you start #

If you are a new user of Graphite you need to read Etsys writeup about Graphite so you have an understanding of how the data is stored and how to configure the metrics. Also read Graphite docs how to get metrics into graphite for a better understanding of the metrics structure.

In Graphite you configure for how long time you want to store metrics and at what precision, so it’s good to check our default configuration (this and this) before you start so you see that it matches your needs.

Configure Graphite #

We provide an example Graphite Docker container and when you put that into production, you need to change the configuration. The configuration depends on how often you want to run your tests. How long you want to keep the result, and how much disk space you want to use.

Starting from sitespeed.io version 4 we send a moderated number of metrics per URL but you can change that yourself.

When you store metrics for a URL in Graphite, you decide from the beginning how long and how often you want to store the data, in storage-schemas.conf. In our example Graphite setup, every key under sitespeed_io is caught by the configuration in storage-schemas.conf that looks like:

[sitespeed]
pattern = ^sitespeed_io\.
retentions = 10m:60d,30m:90d

Every metric that is sent to Graphite following the pattern (the namespace starting with sitespeed_io), Graphite prepares storage for it every ten minutes the first 60 days; after that Graphite uses the configuration in storage-aggregation.conf to aggregate/downsize the metrics the next 90 days.

Depending on how often you run your analysis, you may want to change the storage-schemas.conf. With the current config, if you analyse the same URL within 10 minutes, one of the runs will be discarded. But if you know you only run once an hour, you could increase the setting. Etsy has some really good documentation on how to configure Graphite.

In this example with retention of 10 minutes, the metrics you will send will be in Graphite with a 10 minute interval. If you have a larger interval for example one hour and send annotations on specific seconds, you can then see a larger missmatch between the annotation and when the actual metric got into Graphite. If you are sending metrics to Graphite on a per iteration basis the subsequent runs may be discarded as they arrive within the timeframe. Below is an example taking in closer to real-time metrics and falling back to the default retentions.

[sitespeed_iterations]
pattern = run-\d+\.
retentions = 10s:6h,10m:60d,30m:90d

One thing to know if you change your Graphite configuration: “Any existing metrics created will not automatically adopt the new schema. You must use whisper-resize.py to modify the metrics to the new schema. The other option is to delete existing whisper files (/opt/graphite/storage/whisper) and restart carbon-cache.py for the files to get recreated again.”

Send metrics to Graphite #

To send metrics to Graphite you need to at least configure the Graphite host: --graphite.host.

If you don’t run Graphite on default port you can change that to by --graphite.port.

If your instance is behind authentication you can use --graphite.auth with the format user:password.

If you use a specifc port for the user interface (and where we send the annotations) you can change that with --graphite.httpPort.

If you use a different web host for Graphite than your default host, you can change that with --graphite.webHost. If you don’t use a specific web host, the default domain will be used.

You can choose the namespace under where sitespeed.io will publish the metrics. Default is sitespeed_io.default. Change it with --graphite.namespace. If you want all default dashboards to work, it need to be 2 steps.

Each URL is by default split into domain and the URL when we send it to Graphite. By default sitespeed.io remove query parameters from the URL bit if you need them you can change that by adding --graphite.includeQueryParams.

If you want metrics from each iteration you can use --graphite.experimental.perIteration. Using this will give raw metrics that are not aggregated (min, max, median, mean). The structure for those metrics is experimental and can change.

If you use Graphite < 1.0 you need to make sure the tags in the annotations follow the old format, you do that by adding --graphite.arrayTags.

Debug #

If you want to test and verify what the metrics looks like that you send to Graphite you can use tools/tcp-server.js to verify what it looks like.

  1. Start the server (you need to clone the sitespeed.io repo first): tools/tcp-server.js
  2. You will then get back the port for the server (60447 in this example): Server listening on :::60447
  3. Open another terminal and run sitespeed.io and send the metrics to the tcp-server (read how to reach localhost from Docker): docker run --shm-size=1g --rm -v "$(pwd)":/sitespeed.io sitespeedio/sitespeed.io:11.6.0 https://www.sitespeed.io/ --graphite.host 192.168.65.2 --graphite.port 60447 -n 1
  4. Check the terminal where you have the TCP server running and you will see something like:
sitespeed_io.default.pageSummary.www_sitespeed_io._.chrome.native.browsertime.statistics.timings.pageTimings.backEndTime.median 231 1532591185
sitespeed_io.default.pageSummary.www_sitespeed_io._.chrome.native.browsertime.statistics.timings.pageTimings.backEndTime.mean 231 1532591185
sitespeed_io.default.pageSummary.www_sitespeed_io._.chrome.native.browsertime.statistics.timings.pageTimings.backEndTime.mdev 0 1532591185
sitespeed_io.default.pageSummary.www_sitespeed_io._.chrome.native.browsertime.statistics.timings.pageTimings.backEndTime.min 231 1532591185
sitespeed_io.default.pageSummary.www_sitespeed_io._.chrome.native.browsertime.statistics.timings.pageTimings.backEndTime.p10 231 1532591185
sitespeed_io.default.pageSummary.www_sitespeed_io._.chrome.native.browsertime.statistics.timings.pageTimings.backEndTime.p90 231 1532591185
sitespeed_io.default.pageSummary.www_sitespeed_io._.chrome.native.browsertime.statistics.timings.pageTimings.backEndTime.p99 231 1532591185
sitespeed_io.default.pageSummary.www_sitespeed_io._.chrome.native.browsertime.statistics.timings.pageTimings.backEndTime.max 231 1532591185
...

Keys and metrics in Graphite #

You can read about the keys and the metrics that we send to Graphite in the metrics documentation.

Annotations #

You can send annotations to Graphite to mark when a run happens so you can go from the dashboard to any HTML-results page.

You do that by configuring the URL that will serve the HTML with the CLI param resultBaseURL (the base URL for your S3 or GCS bucket) and configure the HTTP Basic auth username/password used by Graphite. You can do that by setting --graphite.auth LOGIN:PASSWORD.

You can also modify the annotation and append our own text/HTML and add your own tags. Append a message to the annotation with --graphite.annotationMessage. That way you can add links to a specific branch or whatever you feel that can help you. If needed set a custom title with --graphite.annotationTitle instead of the default title that displays the number of runs of the test.

You can add extra tags with --graphite.annotationTag. For multiple tags, add the parameter multiple times. Just make sure that the tags doesn’t collide with our internal tags.

Annotations

You can also include a screenshot from the run in the annotation by adding --graphite.annotationScreenshot to your configuration.

Annotation with screenshots

Use Grafana annotations #

All default dashboards use Graphite annotations. But you can use Grafana built in annotations. That can be good if your organisation is already using them. Note that if you choose to do that, you need to update the dashboards to use Grafana annotations.

To use Grafana annotations, make sure you setup a resultBaseURL and add the host and port to Grafana: --grafana.host and --grafana.port.

Then setup your Grafana API token, follow the instructions at http://docs.grafana.org/http_api/auth/#authentication-api and use the bearer code you get with --grafana.auth. Then your annotations will be sent to Grafana instead of Graphite.

You need to create a new annotation setup in Grafana that matches the templates (the dropdowns) in your dashboard (the same way the default “run” Graphite annotation is setup). It will look something like this: Setup Grafana annotations

Dashboards #

We have pre-made Grafana dashboards that works with Graphite. They are generic and as long as your namespace consists of two parts, they will work. You can import them one by one or inject them using Docker.

Namespace #

The default namsespace when you send metrics to Graphite is sitespeed_io.default. You can change the namespace with --graphite.namespace. All premade dashboards are prepared to work with namespaces that starts with two parts: first.second. If you want more parts, the default dashboards will break.

When we use sitespeed.io we usually keep the first part (sitespeed_io) to separate metrics from other tools that sends metrics to Graphite. We then change the second part: sitespeed_io.desktop, sitespeed_io.emulatedMobile or sitespeed_io.desktopSweden. As long as your namespace has two parts, they will work with the default dashboards.

Warning: Crawling and Graphite #

If you crawl a site that is not static, you will pick up new pages each run or each day, which will make the Graphite database grow daily. When you add metrics to Graphite, it prepares space for those metrics ahead of time, depending on your storage configuration (in Graphite). If you configured Graphite to store individual metrics every 15 minutes for 60 days, Graphite will allocate storage for that URL: 4 (per hour) * 24 (hours per day) * 60 (days), even though you might only test that URL once.

You either need to make sure you have a massive amount of storage, or you should change the storage-schemas.conf so that you don’t keep the metrics for so long. You could do that by setting up another namespace (start of the key) and catch metrics that you only want to store for a shorter time.

The Graphite DB size is determined by the number of unique data points and the frequency of them within configured time periods, meaning you can easily optimize how much space you need. If the majority of the URLs you need to test are static and are tested often, you should find there’s a maximum DB size depending on your storage-schemas.conf settings.

Statsd #

If you are using statsd you can use it by adding --graphite.statsd (and send the metrics to statsd instead of directly to Graphite). You can also choose how many metrics you wanna send per request by configuring --graphite.bulkSize.

If you are a DataDog user you can use DogStatsD.

Secure your instance #

You probably want to make sure that only your sitespeed.io servers can post data to your Graphite instance. If you run on AWS you that with security groups. On Digital Ocean you can setup firewalls through the admin or you can use UFW on Ubuntu (just make sure to disable iptables for the Docker daemon --iptables=false read Viktors post).

Your Graphite server needs to open port 2003 and 8080 for TCP traffic for your servers running sitespeed.io.

If you are using AWS you always gives your servers a security group. The servers running sitespeed.io (collecting metrics) can all have the same group (allows outbound traffic and only allowing inbound for ssh).

The Graphite server can the open 2003 and 8080 only for that group (write the group name in the source/security group field). In this example we also run Grafana on port 3000 and have it open to the world.

Security group AWS

Make sure that when you send data between the server that you using the Private DNS/Private IP of the server (else they cannot reach each other). You find the private IP in the description section of your server in the admin.

Private IP AWS

If you are using Digital Ocean, you can setup the firewall rule in the admin. Here you add each instance that need to be able to send data (sitespeed.io-worker in this example). On this server we also Grafana for HTTP/HTTPS traffic.

Firewall setup Digital Ocean

Storing the data #

You probably gonna need to store the metrics in Graphite on another disk. If you are an AWS user, you can use and setup an EBS volume. If you use Digital Ocean you can follow their quick start guide.

When your volume is mounted on your server that runs Graphite, you need to make sure Graphite uses the. Map the Graphite volume to the new volume outside of Docker (both Whisper and graphite.db). Map them like this on your physical server (make sure to copy the empty graphite.db file):

  • /path/on/server/whisper:/opt/graphite/storage/whisper
  • /path/on/server/graphite.db:/opt/graphite/storage/graphite.db

If you use Grafana annotations, you should make sure grafana.db is outside of the container. Follow the documentation at grafana.org.

Graphite for production (important!) #

  1. Make sure you have configured storage-aggregation.conf in Graphite to fit your needs.
  2. Configure your storage-schemas.conf how long you wanna store your metrics.
  3. MAX_CREATES_PER_MINUTE is usually quite low in carbon.conf. That means you will not get all the metrics created for the first run, so you can increase it.
  4. Map the Graphite volume to a physical directory outside of Docker to have better control (both Whisper and graphite.db). Map them like this on your physical server (make sure to copy the empty graphite.db file):
    • /path/on/server/whisper:/opt/graphite/storage/whisper
    • /path/on/server/graphite.db:/opt/graphite/storage/graphite.db If you use Grafana annotations, you should make sure grafana.db is outside of the container. Follow the documentation at grafana.org.
  5. Run the latest version of Graphite and if you are using Docker, make sure you use a tagged version of the container (like graphiteapp/graphite-statsd:1.1.5-12) and never use the latest Docker tag.
  6. Secure your instance with a firewall/security groups so only your servers can send data to the instance.