prometheus Operator添加自定义监控

2019-08-21

添加自定义监控

步骤:

  • 第一步
    建立一个ServiceMonitor对象,用于Prometheus添加监控项

  • 第二步
    为ServiceMonitor对象关联metrics数据接口的一个Service对象

  • 第三步
    确保Service对象可以正确获取到metrics数据

例一:自定义监控etcd

etcd证书配置

对于 etcd 集群一般情况下,为了安全都会开启 https 证书认证的方式,所以要想让 Prometheus 访问到 etcd 集群的监控数据,就需要提供相应的证书校验。
通过secret对象把etcd用到的证书保存到k8s集群中
kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt
将上面创建的 etcd-certs 对象配置到 prometheus 资源对象中
修改prometheus-prometheus.yaml添加secrets属性,完整如下

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    kubernetes.io/os: linux
  podMonitorSelector: {}
  replicas: 2
  secrets:
  - etcd-certs
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.11.0
kubectl apply -f prometheus-prometheus.yaml

可以进入容器查看证书已经加载

kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring
ls /etc/prometheus/secrets/etcd-certs/

创建 ServiceMonitor

vi prometheus-serviceMonitorEtcd.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: etcd-k8s
  namespace: monitoring
  labels:
    k8s-app: etcd-k8s
spec:
  jobLabel: k8s-app
  endpoints:
  - port: port
    interval: 30s
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/secrets/etcd-certs/ca.crt
      certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt
      keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
      insecureSkipVerify: true
  selector:
    matchLabels:
      k8s-app: etcd
  namespaceSelector:
    matchNames:
    - kube-system
kubectl apply -f prometheus-serviceMonitorEtcd.yaml

上面我们在monitoring名称空间下面创建了名为etcd-k8s的ServiceMonitor对象,匹配kube-system这个命名空间下面的具有k8s-app=etcd这个label的Service,jobLabel表示用于检索job任务名称的标签,和前面不太一样的地方是endpoints属性的写法,配置上访问etcd的相关证书,endpoints属性下面可以配置很多抓取的参数,比如relabel、proxyUrl;tlsConfig表示用于配置抓取监控数据端点的tls认证,由于证书serverName和etcd中签发的可能不匹配,所以加上了insecureSkipVerify=true

创建 Service

ServiceMonitor 创建完成了,但是现在还没有关联的对应的Service对象

vi prometheus-etcdService.yaml

apiVersion: v1
kind: Service
metadata:
  name: etcd-k8s
  namespace: kube-system
  labels:
    k8s-app: etcd
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: port
    port: 2379
    protocol: TCP

---
apiVersion: v1
kind: Endpoints
metadata:
  name: etcd-k8s
  namespace: kube-system
  labels:
    k8s-app: etcd
subsets:
- addresses:
  - ip: 172.16.0.4
    nodeName: etc-k8s01
  - ip: 172.16.0.5
    nodeName: etc-k8s02
  - ip: 172.16.0.6
    nodeName: etc-k8s03
  ports:
  - name: port
    port: 2379
    protocol: TCP
kubectl apply -f prometheus-etcdService.yaml

例二:自定义监控nginx

创建Nginx的deployment和service

vi nginx.yaml

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: nginx-demo
  labels:
    app: nginx-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-demo
  template:
    metadata:
      labels:
        app: nginx-demo
    spec:
      containers:
      - name: nginx-demo
        image: billy98/nginx-prometheus-metrics:latest
        ports:
        - name: http-metrics
          containerPort: 9527
        - name: web
          containerPort: 80
        - name: test
          containerPort: 1314
        imagePullPolicy: IfNotPresent
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx-demo
  name: nginx-demo
  namespace: default
spec:
  ports:
  - name: http-metrics
    port: 9527
    protocol: TCP
    targetPort: 9527
  - name: web
    port: 80
    protocol: TCP
    targetPort: 80
  - name: test
    port: 1314
    protocol: TCP
    targetPort: 1314
  selector:
    app: nginx-demo
  type: ClusterIP
kubectl apply -f nginx.yaml

创建ServiceMonitor

vi nginx-servicemonitor.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: nginx-demo
  name: nginx-demo
  namespace: monitoring
spec:
  endpoints:
  - interval: 15s
    port: http-metrics
  namespaceSelector:
    matchNames:
    - default
  selector:
    matchLabels:
      app: nginx-demo
kubectl apply -f nginx-servicemonitor.yaml

查看endpoints

[root@k8s03 ~]# kubectl get ep
NAME         ENDPOINTS                                                      AGE
kubernetes   172.16.0.4:6443,172.16.0.5:6443,172.16.0.6:6443                2d
nginx-demo   192.168.236.136:9527,192.168.236.136:80,192.168.236.136:1314   31m
[root@k8s03 ~]# curl 192.168.236.136
hello world
[root@k8s03 ~]# curl 192.168.236.136:9527/metrics
# HELP nginx_http_connections Number of HTTP connections
# TYPE nginx_http_connections gauge
nginx_http_connections{state="active"} 3
nginx_http_connections{state="reading"} 0
nginx_http_connections{state="waiting"} 2
nginx_http_connections{state="writing"} 1
# HELP nginx_http_request_bytes_sent Number of HTTP request bytes sent
# TYPE nginx_http_request_bytes_sent counter
nginx_http_request_bytes_sent{host="192.168.236.136"} 874567
nginx_http_request_bytes_sent{host="testservers"} 320
# HELP nginx_http_request_time HTTP request time
# TYPE nginx_http_request_time histogram
nginx_http_request_time_bucket{host="192.168.236.136",le="00.005"} 99
nginx_http_request_time_bucket{host="192.168.236.136",le="00.010"} 99
nginx_http_request_time_bucket{host="192.168.236.136",le="00.020"} 99
... ...

现在Prometheus的targets页面已经可以看到新加入的nginx-demo,在graph页面可以直接查询指标绘图

alertmanager添加自定义报警规则

在Prometheus Dashboard的Config页面可以看到关于AlertManager的配置

alerting:
  alert_relabel_configs:
  - separator: ;
    regex: prometheus_replica
    replacement: $1
    action: labeldrop
  alertmanagers:
  - kubernetes_sd_configs:
    - role: endpoints
      namespaces:
        names:
        - monitoring
    scheme: http
    path_prefix: /
    timeout: 10s
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_name]
      separator: ;
      regex: alertmanager-main
      replacement: $1
      action: keep
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      separator: ;
      regex: web
      replacement: $1
      action: keep
rule_files:
- /etc/prometheus/rules/prometheus-k8s-rulefiles-0/*.yaml
... ...

上面alertmanagers实例的配置我们可以看到是通过角色为endpoints的kubernetes的服务发现机制获取的,匹配的是服务名为alertmanager-main,端口名为web的Service服务
我们查看下alertmanager-main这个Service

[root@k8s03 manifests]# kubectl describe svc alertmanager-main -n monitoring
Name:                     alertmanager-main
Namespace:                monitoring
Labels:                   alertmanager=main
Annotations:              kubectl.kubernetes.io/last-applied-configuration:
                            {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"alertmanager":"main"},"name":"alertmanager-main","namespace":"...
Selector:                 alertmanager=main,app=alertmanager
Type:                     NodePort
IP:                       10.100.176.121
Port:                     web  9093/TCP
TargetPort:               web/TCP
NodePort:                 web  31568/TCP
Endpoints:                192.168.236.134:9093,192.168.73.69:9093,192.168.73.71:9093
Session Affinity:         ClientIP
External Traffic Policy:  Cluster
Events:                   <none>

可以看到服务名正是alertmanager-main,Port定义的名称也是web,符合上面的规则,所以Prometheus和AlertManager组件就正确关联上了。而对应的报警规则文件位于:/etc/prometheus/rules/prometheus-k8s-rulefiles-0/目录下面所有的YAML文件。
我们创建一个PrometheusRule资源对象后,会自动在上面的prometheus-k8s-rulefiles-0目录下面生成一个对应的-.yaml文件
Prometheus资源对象里面有非常重要的一个属性ruleSelector,用来匹配rule规则的过滤器,要求匹配具有prometheus=k8s和role=alert-rules标签的PrometheusRule资源对象。
所以我们要想自定义一个报警规则,只需要创建一个具有prometheus=k8s和role=alert-rules标签的PrometheusRule对象就行了

vi prometheus-etcdRules.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: etcd-rules
  namespace: monitoring
spec:
  groups:
  - name: etcd
    rules:
    - alert: EtcdClusterUnavailable
      annotations:
        summary: etcd cluster small
        description: If one more etcd peer goes down the cluster will be unavailable
      expr: |
        count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2 - 1)
      for: 3m
      labels:
        severity: critical
kubectl apply -f prometheus-etcdRules.yaml
kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring
ls /etc/prometheus/rules/prometheus-k8s-rulefiles-0/
cat /etc/prometheus/rules/prometheus-k8s-rulefiles-0/monitoring-etcd-rules.yaml

标题:prometheus Operator添加自定义监控
作者:fish2018
地址:http://devopser.org/articles/2019/08/21/1566379625905.html