基于?Prometheus?的監(jiān)控系統(tǒng)實踐
監(jiān)控作為底層基礎設施的一環(huán),是保障生產環(huán)境服務穩(wěn)定性不可或缺的一部分,線上問題從發(fā)現到定位再到解決,通過監(jiān)控和告警手段可以有效地覆蓋了「發(fā)現」和「定位」,甚至可以通過故障自愈等手段實現解決,服務開發(fā)和運維人員能及時有效地發(fā)現服務運行的異常,從而更有效率地排查和解決問題。一個典型的監(jiān)控(如白盒監(jiān)控),通常會關注于目標服務的內部狀態(tài),例如:
- 單位時間接收到的請求數量
- 單位時間內請求的成功率/失敗率
- 請求的平均處理耗時
- 支持 PromQL(一種查詢語言),可以靈活地聚合指標數據
- 部署簡單,只需要一個二進制文件就能跑起來,不需要依賴分布式存儲
- Go 語言編寫,組件更方便集成在同樣是Go編寫項目代碼中
- 原生自帶 WebUI,通過 PromQL 渲染時間序列到面板上
- 生態(tài)組件眾多,Alertmanager,Pushgateway,Exporter……
- 使用基礎 Unit(如 seconds 而非 milliseconds)
- 指標名以 application namespace 作為前綴,如:
- process_cpu_seconds_total
- http_request_duration_seconds
- 用后綴來描述 Unit,如:
- http_request_duration_seconds
- node_memory_usage_bytes
- http_requests_total
- process_cpu_seconds_total
- foobar_build_info
- Counter:代表一種樣本數據單調遞增的指標,即只增不減,通常用來統(tǒng)計如服務的請求數,錯誤數等。
- Gauge:代表一種樣本數據可以任意變化的指標,即可增可減,通常用來統(tǒng)計如服務的CPU使用值,內存占用值等。
- Histogram?和?Summary:用于表示一段時間內的數據采樣和點分位圖統(tǒng)計結果,通常用來統(tǒng)計請求耗時或響應大小等。
http_requests{host="host1",service="web",code="200",env="test"}
查詢結果會是一個瞬時向量:http_requests{host="host1",service="web",code="200",env="test"} 10
http_requests{host="host2",service="web",code="200",env="test"} 0
http_requests{host="host3",service="web",code="200",env="test"} 12
而如果給這個條件加上一個時間參數,查詢一段時間內的時間序列:http_requests{host="host1",service="web",code="200",env="test"}[:5m]
結果將會是一個范圍向量:http_requests{host="host1",service="web",code="200",env="test"} 0 4 6 8 10
http_requests{host="host2",service="web",code="200",env="test"} 0 0 0 0 0
http_requests{host="host3",service="web",code="200",env="test"} 0 2 5 9 12
擁有了范圍向量,我們是否可以針對這些時間序列進行一些聚合運算呢?沒錯,PromQL就是這么干的,比如我們要算最近5分鐘的請求增長速率,就可以拿上面的范圍向量加上聚合函數來做運算:rate(http_requests{host="host1",service="web",code="200",env="test"}[:5m])
比如要求最近5分鐘請求的增長量,可以用以下的 PromQL:increase(http_requests{host="host1",service="web",code="200",env="test"}[:5m])
要計算過去10分鐘內第90個百分位數:histogram_quantile(0.9, rate(employee_age_bucket_bucket[10m]))
在 Prometheus 中,一個指標(即擁有唯一的標簽集的 metric)和一個(timestamp,value)組成了一個樣本(sample),Prometheus 將采集的樣本放到內存中,默認每隔2小時將數據壓縮成一個 block,持久化到硬盤中,樣本的數量越多,Prometheus占用的內存就越高,因此在實踐中,一般不建議用區(qū)分度(cardinality)太高的標簽,比如:用戶IP,ID,URL地址等等,否則結果會造成時間序列數以指數級別增長(label數量相乘)。除了控制樣本數量和大小合理之外,還可以通過降低 storage.tsdb.min-block-duration 來加快數據落盤時間和增加 scrape interval 的值提高拉取間隔來控制 Prometheus 的占用內存。通過聲明配置文件中的 scrape_configs 來指定 Prometheus 在運行時需要拉取指標的目標,目標實例需要實現一個可以被 Prometheus 進行輪詢的端點,而要實現一個這樣的接口,可以用來給 Prometheus 提供監(jiān)控樣本數據的獨立程序一般被稱作為 Exporter,比如用來拉取操作系統(tǒng)指標的 Node Exporter,它會從操作系統(tǒng)上收集硬件指標,供 Prometheus 來拉取。在開發(fā)環(huán)境,往往只需要部署一個 Prometheus 實例便可以滿足數十萬指標的收集。但在生產環(huán)境中,應用和服務實例數量眾多,只部署一個 Prometheus 實例通常是不夠的,比較好的做法是部署多個Prometheus實例,每個實例通過分區(qū)只拉取一部分指標,例如Prometheus Relabel配置中的hashmod功能,可以對拉取目標的地址進行hashmod,再將結果匹配自身ID的目標保留:relabel_configs:
- source_labels: [__address__]
modulus: 3
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: $(PROM_ID)
action: keep
或者說,我們想讓每個 Prometheus 拉取一個集群的指標,一樣可以用 Relabel 來完成:relabel_configs:
- source_labels: ["__meta_consul_dc"]
regex: "dc1"
action: keep
現在每個 Prometheus 都有各自的數據了,那么怎么把他們關聯(lián)起來,建立一個全局的視圖呢?官方提供了一個做法:聯(lián)邦集群(federation),即把 Prometheuse Server 按照樹狀結構進行分層,根節(jié)點方向的 Prometheus 將查詢葉子節(jié)點的 Prometheus 實例,再將指標聚合返回。不過顯然易見的時,使用聯(lián)邦集群依然不能解決問題,首先單點問題依然存在,根節(jié)點掛了的話查詢將會變得不可用,如果配置多個父節(jié)點的話又會造成數據冗余和抓取時機導致數據不一致等問題,而且葉子節(jié)點目標數量太多時,更加會容易使父節(jié)點壓力增大以至打滿宕機,除此之外規(guī)則配置管理也是個大麻煩。還好社區(qū)出現了一個 Prometheus 的集群解決方案:Thanos,它提供了全局查詢視圖,可以從多臺Prometheus查詢和聚合數據,因為所有這些數據均可以從單個端點獲取。- Querier 收到一個請求時,它會向相關的 Sidecar 發(fā)送請求,并從他們的 Prometheus 服務器獲取時間序列數據。
- 它將這些響應的數據聚合在一起,并對它們執(zhí)行 PromQL 查詢。它可以聚合不相交的數據也可以針對 Prometheus 的高可用組進行數據去重。
為了部署 Prometheus 實例,需要聲明 Prometheus 的 StatefulSet,Pod 中包括了三個容器,分別是 Prometheus 以及綁定的 Thanos Sidecar,最后再加入一個 watch 容器,來監(jiān)聽 prometheus 配置文件的變化,當修改 ConfigMap 時就可以自動調用Prometheus 的 Reload API 完成配置加載,這里按照之前提到的數據分區(qū)的方式,在Prometheus 啟動前加入一個環(huán)境變量 PROM_ID,作為 Relabel 時 hashmod 的標識,而 POD_NAME 用作 Thanos Sidecar 給 Prometheus 指定的 external_labels.replica 來使用:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
labels:
app: prometheus
spec:
serviceName: "prometheus"
updateStrategy:
type: RollingUpdate
replicas: 3
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
thanos-store-api: "true"
spec:
serviceAccountName: prometheus
volumes:
- name: prometheus-config
configMap:
name: prometheus-config
- name: prometheus-data
hostPath:
path: /data/prometheus
- name: prometheus-config-shared
emptyDir: {}
containers:
- name: prometheus
image: prom/prometheus:v2.11.1
args:
- --config.file=/etc/prometheus-shared/prometheus.yml
- --web.enable-lifecycle
- --storage.tsdb.path=/data/prometheus
- --storage.tsdb.retention=2w
- --storage.tsdb.min-block-duration=2h
- --storage.tsdb.max-block-duration=2h
- --web.enable-admin-api
ports:
- name: http
containerPort: 9090
volumeMounts:
- name: prometheus-config-shared
mountPath: /etc/prometheus-shared
- name: prometheus-data
mountPath: /data/prometheus
livenessProbe:
httpGet:
path: /-/healthy
port: http
- name: watch
image: watch
args: ["-v", "-t", "-p=/etc/prometheus-shared", "curl", "-X", "POST", "--fail", "-o", "-", "-sS", "http://localhost:9090/-/reload"]
volumeMounts:
- name: prometheus-config-shared
mountPath: /etc/prometheus-shared
- name: thanos
image: improbable/thanos:v0.6.0
command: ["/bin/sh", "-c"]
args:
- PROM_ID=`echo $POD_NAME| rev | cut -d '-' -f1` /bin/thanos sidecar
--prometheus.url=http://localhost:9090
--reloader.config-file=/etc/prometheus/prometheus.yml.tmpl
--reloader.config-envsubst-file=/etc/prometheus-shared/prometheus.yml
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
ports:
- name: http-sidecar
containerPort: 10902
- name: grpc
containerPort: 10901
volumeMounts:
- name: prometheus-config
mountPath: /etc/prometheus
- name: prometheus-config-shared
mountPath: /etc/prometheus-shared
因為 Prometheus 默認是沒辦法訪問 Kubernetes 中的集群資源的,因此需要為之分配RBAC:apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: prometheus
namespace: default
labels:
app: prometheus
rules:
- apiGroups: [""]
resources: ["services", "pods", "nodes", "nodes/proxy", "endpoints"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["create"]
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["prometheus-config"]
verbs: ["get", "update", "delete"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: prometheus
namespace: default
labels:
app: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
roleRef:
kind: ClusterRole
name: prometheus
apiGroup: ""
接著 Thanos Querier 的部署比較簡單,需要在啟動時指定 store 的參數為dnssrv thanos-store-gateway.default.svc來發(fā)現Sidecar:apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: thanos-query
name: thanos-query
spec:
replicas: 2
selector:
matchLabels:
app: thanos-query
minReadySeconds: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
template:
metadata:
labels:
app: thanos-query
spec:
containers:
- args:
- query
- --log.level=debug
- --query.timeout=2m
- --query.max-concurrent=20
- --query.replica-label=replica
- --query.auto-downsampling
- --store=dnssrv thanos-store-gateway.default.svc
- --store.sd-dns-interval=30s
image: improbable/thanos:v0.6.0
name: thanos-query
ports:
- containerPort: 10902
name: http
- containerPort: 10901
name: grpc
livenessProbe:
httpGet:
path: /-/healthy
port: http
---
apiVersion: v1
kind: Service
metadata:
labels:
app: thanos-query
name: thanos-query
spec:
type: LoadBalancer
ports:
- name: http
port: 10901
targetPort: http
selector:
app: thanos-query
---
apiVersion: v1
kind: Service
metadata:
labels:
thanos-store-api: "true"
name: thanos-store-gateway
spec:
type: ClusterIP
clusterIP: None
ports:
- name: grpc
port: 10901
targetPort: grpc
selector:
thanos-store-api: "true"
部署Thanos Ruler:apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: thanos-rule
name: thanos-rule
spec:
replicas: 1
selector:
matchLabels:
app: thanos-rule
template:
metadata:
labels:
labels:
app: thanos-rule
spec:
containers:
- name: thanos-rule
image: improbable/thanos:v0.6.0
args:
- rule
- --web.route-prefix=/rule
- --web.external-prefix=/rule
- --log.level=debug
- --eval-interval=15s
- --rule-file=/etc/rules/thanos-rule.yml
- --query=dnssrv thanos-query.default.svc
- --alertmanagers.url=dns http://alertmanager.default
ports:
- containerPort: 10902
name: http
volumeMounts:
- name: thanos-rule-config
mountPath: /etc/rules
volumes:
- name: thanos-rule-config
configMap:
name: thanos-rule-config
部署 Pushgateway:apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: pushgateway
name: pushgateway
spec:
replicas: 15
selector:
matchLabels:
app: pushgateway
template:
metadata:
labels:
app: pushgateway
spec:
containers:
- image: prom/pushgateway:v1.0.0
name: pushgateway
ports:
- containerPort: 9091
name: http
resources:
limits:
memory: 1Gi
requests:
memory: 512Mi
---
apiVersion: v1
kind: Service
metadata:
labels:
app: pushgateway
name: pushgateway
spec:
type: LoadBalancer
ports:
- name: http
port: 9091
targetPort: http
selector:
app: pushgateway
部署 Alertmanager:apiVersion: apps/v1
kind: Deployment
metadata:
name: alertmanager
spec:
replicas: 3
selector:
matchLabels:
app: alertmanager
template:
metadata:
name: alertmanager
labels:
app: alertmanager
spec:
containers:
- name: alertmanager
image: prom/alertmanager:latest
args:
- --web.route-prefix=/alertmanager
- --config.file=/etc/alertmanager/config.yml
- --storage.path=/alertmanager
- --cluster.listen-address=0.0.0.0:8001
- --cluster.peer=alertmanager-peers.default:8001
ports:
- name: alertmanager
containerPort: 9093
volumeMounts:
- name: alertmanager-config
mountPath: /etc/alertmanager
- name: alertmanager
mountPath: /alertmanager
volumes:
- name: alertmanager-config
configMap:
name: alertmanager-config
- name: alertmanager
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
name: alertmanager-peers
name: alertmanager-peers
spec:
type: ClusterIP
clusterIP: None
selector:
app: alertmanager
ports:
- name: alertmanager
protocol: TCP
port: 9093
targetPort: 9093
最后部署一下 ingress,大功告成:apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: pushgateway-ingress
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/upstream-hash-by: "$request_uri"
nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
rules:
- host: $(DOMAIN)
http:
paths:
- backend:
serviceName: pushgateway
servicePort: 9091
path: /metrics
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: prometheus-ingress
annotations:
kubernetes.io/ingress.class: "nginx"
spec:
rules:
- host: $(DOMAIN)
http:
paths:
- backend:
serviceName: thanos-query
servicePort: 10901
path: /
- backend:
serviceName: alertmanager
servicePort: 9093
path: /alertmanager
- backend:
serviceName: thanos-rule
servicePort: 10092
path: /rule
- backend:
serviceName: grafana
servicePort: 3000
path: /grafana
訪問 Prometheus 地址,監(jiān)控節(jié)點狀態(tài)正常:來源:https://zhuanlan.zhihu.com/p/101184971