使用Prometheus收集Docker指标

预计阅读时间:8分钟

Prometheus是一个开源系统监视和警报工具包。您可以将Docker配置为Prometheus目标。本主题向您展示如何配置Docker,设置Prometheus以作为Docker容器运行以及如何使用Prometheus监视Docker实例。

警告:可用的度量标准和这些度量标准的名称正在开发中,并可能随时更改。

当前,您只能监视Docker本身。您当前无法使用Docker目标监视您的应用程序。

配置Docker

要将Docker守护程序配置为Prometheus目标,您需要指定 metrics-address。最好的方法是通过daemon.json,默认情况下,它位于以下位置之一。如果文件不存在,请创建它。

  • Linux/etc/docker/daemon.json
  • Windows ServerC:\ProgramData\docker\config\daemon.json
  • 适用于Mac的Docker桌面/适用于Windows的Docker桌面:单击工具栏中的Docker图标,选择首选项,然后选择守护程序。单击高级

如果文件当前为空,请粘贴以下内容:

{
  "metrics-addr" : "127.0.0.1:9323",
  "experimental" : true
}

如果文件不为空,请添加这两个键,并确保生成的文件是有效的JSON。请注意,,除最后一行外,每行均以逗号()结尾。

保存文件,或者对于Mac的Docker Desktop或Windows的Docker Desktop,保存配置。重新启动Docker。

Docker现在在端口9323上公开了与Prometheus兼容的指标。

配置并运行Prometheus

Prometheus作为Docker服务在Docker群上运行。

先决条件

  1. 使用docker swarm init 一个管理器以及docker swarm join其他管理器和工作器节点,将一个或多个Docker引擎加入到Docker群中。

  2. 您需要互联网连接才能提取Prometheus图像。

复制以下配置文件之一,并将其​​保存到 /tmp/prometheus.yml(Linux或Mac)或C:\tmp\prometheus.yml(Windows)。这是一个常规的Prometheus配置文件,除了在文件底部添加了Docker作业定义外。Mac的Docker桌面和Windows的Docker桌面需要稍有不同的配置。

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'codelab-monitor'

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first.rules"
  # - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'docker'
         # metrics_path defaults to '/metrics'
         # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9323']
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'codelab-monitor'

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first.rules"
  # - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['host.docker.internal:9090'] # Only works on Docker Desktop for Mac

  - job_name: 'docker'
         # metrics_path defaults to '/metrics'
         # scheme defaults to 'http'.

    static_configs:
      - targets: ['docker.for.mac.host.internal:9323']
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'codelab-monitor'

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first.rules"
  # - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['host.docker.internal:9090'] # Only works on Docker Desktop for Windows

  - job_name: 'docker'
         # metrics_path defaults to '/metrics'
         # scheme defaults to 'http'.

    static_configs:
      - targets: ['192.168.65.1:9323']

接下来,使用此配置启动单副本Prometheus服务。

$ docker service create --replicas 1 --name my-prometheus \
    --mount type=bind,source=/tmp/prometheus.yml,destination=/etc/prometheus/prometheus.yml \
    --publish published=9090,target=9090,protocol=tcp \
    prom/prometheus
$ docker service create --replicas 1 --name my-prometheus \
    --mount type=bind,source=/tmp/prometheus.yml,destination=/etc/prometheus/prometheus.yml \
    --publish published=9090,target=9090,protocol=tcp \
    prom/prometheus
PS C:\> docker service create --replicas 1 --name my-prometheus
    --mount type=bind,source=C:/tmp/prometheus.yml,destination=/etc/prometheus/prometheus.yml
    --publish published=9090,target=9090,protocol=tcp
    prom/prometheus

验证Docker目标是否在http:// localhost:9090 / targets /中列出。

普罗米修斯目标页面

如果使用Mac的Docker桌面或Windows的Docker桌面,则无法直接访问端点URL。

使用普罗米修斯

创建一个图形。单击Prometheus UI中的“图形”链接。从“执行”按钮右侧的组合框中选择一个指标,然后单击“ 执行”。以下屏幕截图显示了的图形 engine_daemon_network_actions_seconds_count

Prometheus engine_daemon_network_actions_seconds_count报告

上图显示了一个非常空闲的Docker实例。如果您正在运行活动的工作负载,则图表的外观可能会有所不同。

为了使图更有趣,通过启动一项服务来创建一些网络操作,这些服务只需执行10个不间断ping Docker的任务(您可以将ping目标更改为您喜欢的任何对象):

$ docker service create \
  --replicas 10 \
  --name ping_service \
  alpine ping docker.com

等待几分钟(默认的抓取间隔为15秒),然后重新加载图形。

Prometheus engine_daemon_network_actions_seconds_count报告

准备就绪后,请停止并删除该ping_service服务,以使您不会无缘无故地用ping泛洪主机。

$ docker service remove ping_service

等待几分钟,您应该会看到图表回落到空闲水平。

下一步

普罗米修斯指标