In this course, we take a look at the Tsar of monitoring tools - Prometheus. Prometheus is the second hosted project in the Cloud Native Computing Foundation, right next to the container orchestrating software - Kubernetes. Prometheus is an open-sourced systems monitoring and alerting toolkit with additional capabilities in service discovery.
If you have any feedback relating to this course, feel free to contact us at firstname.lastname@example.org.
- Understand and define the Prometheus monitoring tool
- Learn the core features of the tool
- Breakdown and understand the core components of the service
- Learn how to set up node exporters and a Prometheus monitor
- DevOps engineers, site reliability engineers, and cloud engineers
- Anyone looking to up their monitoring expertise with an open-source monitoring tool
To get the most out of this course, you should have some familiarity with monitoring tools. Experience using a Terminal, Git, Bash, or Shell would be beneficial but not essential.
Welcome back to an introduction to Prometheus. Not wasting any time, let's pick up right where we left off and what I promised, checking out the core set of features that make up Prometheus.
First, Prometheus is a time series data identification monitoring tool. It identifies it's metrics through metric name and key value pairs. These key value pairs could be seen as such. For example, we have our go_info, which displays the version of our Prometheus server hosted on the localhost at port 9090.
Following the time series data identification, we have our functional query language, which is built out of a specific, custom, PromQL tool. The PromQL performs aggregation in real time of your metrics. You can graph these metrics and they can also be consumed by external systems via an HTTP API. If you would like to pre-configure rules, they can speed up longer queries in addition to the query language available through the Web UI and the API.
Let's take a look at the PromQL rules now. First in our configuration file for Prometheus, recording rules are referred to via a file containing their configuration. So we have rule files and then the destination where our rule files exist. You can have multiple rule files for quick aggregation of certain metrics that you may be interested in. It looks like this in my VS Code editor. We have groups, the name of the rule, and then the rules by recording, and then the expression which we want to evaluate them against. This is how it would look like in the Web UI if you were to view the rule in the Prometheus Web UI.
Prometheus is really strong in that it doesn't have a reliancy on distributed storage system, which means we gain high throughput on our On-Node storage. This can be a hard disk drive or a solid-state drive. This also means that it's not meant as a long-term solution for metrics. Ideally, three to six months on average, but it's also left up to the engineers on how they want to offload those metrics to a long-term storage solution such as Thanos.
Prometheus collects its metrics through HTT pulls, which means that you have one URI per exporter, or per tool that is exporting those metrics, to rule them all. It scrapes these endpoints and these endpoints can be either defined, or discovered.
Let's take a look at some defined endpoints and discovered endpoints now. So if our metrics, or collection, over HTT pulls, we need to either define them, which can be thought of statically configured in our YAML file, or we need to discover them through Prometheus's DNS discovery service.
If we want to see a statically configured config, it would look as such, with the job name, scrape interval, a timeout, the path for those metrics, how we'd like to scrape those, in this case HTTP, and then our targets. If we wanted to see discovered through Prometheus Web UI, it would look as such, with our discovered labels, and our target labels. Target labels are labels that we have specified for ourselves, and discovered labels are labels that Prometheus has discovered.
From the Core Set, we have pushing, which is interesting because Prometheus pulls all of its metrics. So what is pushing doing involved? Well, pushing is available through the PushGateway. And these are only meant for non-scrapeable components, such as short-lived jobs. We're gonna be diving further into this in our components break down, but think of this as a briefer on the PushGateway. The PushGateway is also great at presenting metrics to Prometheus for pulling. You could think of it as an intermediary server for Prometheus to pull from.
From there, we have our flexible graphing and dashboarding, which is built off of PromDash, a GUI based builder with a SQL backend. And finally, we have our HTTP API, which is involved primarily with the Prometheus server and exists at /api/v1.
Before we move on, I want you to keep this diagram in mind as we jump into the components breakdown. Each of these orange squares are something that we're going to dissect and then view in greater detail. It's also the main idea and architecture for how you would set up a Prometheus environment should you need to. When you're ready, move on to the next lecture.
Jonathan Lewey is a DevOps Content Creator at Cloud Academy. With experience in the Networking and Operations of the traditional Information Technology industry, he has also lead the creation of applications for corporate integrations, and served as a Cloud Engineer supporting developer teams. Jonathan has a number of specialities including: a Cisco Certified Network Associate (R&S / Sec), an AWS Developer Associate, an AWS Solutions Architect, and is certified in Project Management.