Monitoring Spring Boot applications with Prometheus and Grafana

-

At my current project, we’ve been building three different applications. All three applications are based on Spring Boot, but have very different workloads. They’ve all reached their way to the production environment and have been running steadily for quite some time now. We do regular (weekly basis) deployments of our applications to production with bug fixes, new features, and technical improvements. The organisation has a traditional infrastructure workflow in the sense that deployments to the VM instances on acceptance and production happen via the (remote) hosting provider.

The hosting provider is responsible for the uptime of the applications and therefore they keep an eye on system metrics through the usage of their own monitoring system. As a team, we are able to look in the system, but it doesn’t say much about the internals of our application. In the past, we’ve asked to add some additional metrics to their system, but the system isn’t that easy to configure with additional metrics. To us as a team runtime statistics about our applications and the impact our changes have on the overall health are crucial to understanding the impact of our work. The rest of this post will give a short description of our journey and the reasons why we chose the resulting setup.

Spring Boot Actuator and Micrometer

If you’ve used Spring Boot before you’ve probably heard of Spring Boot Actuator. Actuator is a set of features that help you monitor and manage your application when it moves away from your local development environment and onto a test, staging or production environment. It helps expose operational information about the running application – health, metrics, audit entries, scheduled task, env settings, etc. You can query the information via either several HTTP endpoints or JMX beans. Being able to view the information is useful, but it’s hard to spot trends or see the behaviour over a period of time.

When we recently upgraded our projects to Spring Boot 2 my team was pretty excited that we were able to start using micrometer a (new) instrumentation library powering the delivery of application metrics. Micrometer is now the default metrics library in Spring Boot 2 and it doesn’t just give you metrics from your Spring application, but can also deliver JVM metrics (garbage collection and memory pools, etc) and also metrics from the application container. Micrometer has several different libraries that can be included to ship metrics to different backends and has support for Prometheus, Netflix Atlas, CloudWatch, Datadog, Graphite, Ganglia, JMX, Influx/Telegraf, New Relic, StatsD, SignalFx, and Wavefront.

Because we didn’t have a lot of control over the way our applications were deployed we looked at the several different backends supported by micrometer. Most of the above backends work by pushing data out to a remote (cloud) service. Since the organisation we work for doesn’t allow us to push this ‘sensitive’ data to a remote party we looked at self-hosted solutions. We did a quick scan and started with looking into Prometheus (and Grafana) and soon learned that it was really easy to get a monitoring system up and we had a running system within an hour.

To be able to use Spring Boot Actuator and Prometheus together you need to add two dependencies to your project:


    org.springframework.boot
    spring-boot-starter-actuator



    io.micrometer
    micrometer-registry-prometheus

Actuator has an endpoint available for prometheus to scrape but it’s not exposed by default, so you will need to enable the endpoint by means of configuration. In this case, I’ll do so via the application.properties.

management.endpoint.prometheus.enabled=true
management.endpoints.web.exposure.include=prometheus,info,health

Now if you browse to http(s)://host(:8080)/actuator/prometheus you will see the output that prometheus will scrape to get the information from your application. A small snippet of the information provided by the endpoint is shown below, but there is a lot more information that the prometheus endpoint will expose.

# HELP tomcat_global_sent_bytes_total  
# TYPE tomcat_global_sent_bytes_total counter
tomcat_global_sent_bytes_total{name="http-nio-8080",} 75776.0
tomcat_global_sent_bytes_total{name="http-nio-8443",} 1.0182049E8
# HELP tomcat_servlet_request_max_seconds  
# TYPE tomcat_servlet_request_max_seconds gauge
tomcat_servlet_request_max_seconds{name="default",} 0.0
tomcat_servlet_request_max_seconds{name="jsp",} 0.0
# HELP process_files_open The open file descriptor count
# TYPE process_files_open gauge
process_files_open 91.0
# HELP system_cpu_usage The "recent cpu usage" for the whole system
# TYPE system_cpu_usage gauge
system_cpu_usage 0.00427715996578272
# HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management
# TYPE jvm_memory_max_bytes gauge
jvm_memory_max_bytes{area="nonheap",id="Code Cache",} 2.5165824E8
jvm_memory_max_bytes{area="nonheap",id="Metaspace",} -1.0
jvm_memory_max_bytes{area="nonheap",id="Compressed Class Space",} 1.073741824E9
jvm_memory_max_bytes{area="heap",id="PS Eden Space",} 1.77733632E8
jvm_memory_max_bytes{area="heap",id="PS Survivor Space",} 524288.0
jvm_memory_max_bytes{area="heap",id="PS Old Gen",} 3.58088704E8

Now that everything is configured from the application perspective, let’s move on to Prometheus itself.

Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloudand now part of the Cloud Native Computing Foundation. To get a better understanding of what prometheus really is let us take a look at an architectural diagram.

(Source: https://prometheus.io/docs/introduction/overview/)

The prometheus server contains of a set of 3 features:

  • A time series database
  • A retrieval component which scrapes its targets for information
  • An HTTP server which you can use to query information stored inside the time series database

To make it even more powerful there are some additional components which you can use if you want:

  • An alert manager, which you can use to send alerts via Pagerduty, Slack, etc.
  • push gateway in case you need to push information to prometheus instead of using the default pull mechanism
  • Grafana for visualizing data and creating dashboards

When looking at Prometheus the most appealing features for us were:

  • no reliance on distributed storage; single server nodes are autonomous
  • time series collection happens via a pull model over HTTP
  • targets are discovered via service discovery or static configuration
  • multiple modes of graphing and dashboarding support

To get up and running quickly you can configure prometheus to scrape some (existing) Spring Boot applications. For scraping targets, you will need to specify them within the prometheus configuration. Prometheus uses a file called prometheus.yml as its main configuration file. Within the configuration file, you can specify where it can find the targets it needs to monitor, specify recording rules and alerting rules.

The following example shows a configuration with a set of static targets for both prometheus itself and our spring boot application.

global:
  scrape_interval:   15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'bootifull-monitoring'

scrape_configs:
- job_name:       'monitoring-demo'

  # Override the global default and scrape targets from this job every 10 seconds.
  scrape_interval: 10s
  metrics_path: '/actuator/prometheus'

  static_configs:
  - targets: ['monitoring-demo:8080']
    labels:
      application: 'monitoring-demo'

- job_name: 'prometheus'

  scrape_interval: 5s

  static_configs:
  - targets: ['localhost:9090']

As you can see the configuration is pretty simple. You can add specific labels to the targets which can, later on, be used for querying, filtering and creating a dashboard based upon the information stored within prometheus.

If you want to get started quickly with Prometheus and have docker on your environment you can use the official docker prometheus image by running the following command and provide a custom configuration from your host machine by running:

docker run -p 9090:9090 -v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml \
       prom/prometheus:v2.4.3

In the above example we bind-mount the main prometheus configuration file from the host system, so you can, for instance, use the above configuration. Prometheus itself has some basic graphing capabilities (as you can see in the following image), but they are more meant to be used when doing some ad-hoc queries.

For creating an application monitoring dashboard Grafana is much more suited.

Grafana

So what is Grafana and what role does it play in our monitoring stack?

Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture.

The cool thing about Grafana is (next to the beautiful UI) that it’s not tied to Prometheus as its single data source like for instance Kibana is tied to Elasticsearch. Grafana can have many different data sources like AWS Cloudwatch, Elasticsearch, InfluxDB, Prometheus, etc. This makes it a very good option for creating a monitoring dashboard. Grafana talks to prometheus by using the PromQL query language.

For Grafana there is also an official Docker image available for you to use. You can get Grafana up and running with a simple command.

docker run -p 3000:3000 grafana/grafana:5.2.4

Now if we connect Grafana with Prometheus as the datasource and install this excellent JVM Micrometer dashboard into Grafana we can instantly start monitoring our Spring Boot application. You will end up with a pretty mature dashboard that lets you switch between different instances of your application.

If you want to start everything all at once you can easily use docker-compose.

version: "3"
services:
  app:
    image: monitoring-demo:latest
    container_name: 'monitoring-demo'
    build:
      context: ./
      dockerfile: Dockerfile
    ports:
    - '8080:8080'
  prometheus:
    image: prom/prometheus:v2.4.3
    container_name: 'prometheus'
    volumes:
    - ./monitoring/prometheus/:/etc/prometheus/
    ports:
    - '9090:9090'
  grafana:
    image: grafana/grafana:5.2.4
    container_name: 'grafana'
    ports:
    - '3000:3000'
    volumes:
    - ./monitoring/grafana/provisioning/:/etc/grafana/provisioning/
    env_file:
    - ./monitoring/grafana/config.monitoring
    depends_on:
    - prometheus

I’ve put together a small demo project, containing a simple Spring Boot application and the above prometheus configuration, in a github repository for demo and experimentation purposes. Now if you want to generate some statistics run a small load test with JMeter or Apache Bench. Feel free to use/fork it!