zookeeper3.4 prometheus监控
所属分类 zookeeper
浏览量 275
3.6.0 以上版本,原生支持开放指标接口供Prometheus采集
zoo.cfg
metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
metricsProvider.httpPort=7000
metricsProvider.exportJvmInfo=true
低于该版本 使用zookeeper-exporter进行采集
https://github.com/jiankunking/zookeeper_exporter
https://github.com/carlpett/zookeeper_exporter/releases/download/v1.0.2/zookeeper_exporter
export 适合 zookeeper3.4+
Zookeerper Exporter Overview
https://grafana.com/dashboards/9236
基于4字命令采集指标
expoter采集报错
mntr is not executed because it is not in the whitelist
telnet 127.0.0.1 2181
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
mntr
mntr is not executed because it is not in the whitelist.
zoo.cfg
4lw.commands.whitelist=*
./zkServer.sh restart
重点关注的指标
zk_outstanding_requests 堆积请求数
zk_pending_syncs 阻塞中的 sync 操作
zk_avg_latency 平均 响应延迟
zk_open_file_descriptor_count 打开 文件描述符 数
zk_max_file_descriptor_count 最大 文件描述符 数
zk_up 1
zk_server_state 主从状态
zk_num_alive_connections 活跃连接数
监控告警设置 参考
groups:
- name: zookeeperStatsAlert
rules:
- alert: 堆积请求数过大
expr: avg(zk_outstanding_requests) by (instance) > 10 for: 1m
labels: severity: critical
annotations:
summary: "Instance {{ $labels.instance }} "
description: "积请求数过大"
- alert: 阻塞中的 sync 过多
expr: avg(zk_pending_syncs) by (instance) > 10
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} "
description: "塞中的 sync 过多"
- alert: 平均响应延迟过高
expr: avg(zk_avg_latency) by (instance) > 10
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} "
description: '平均响应延迟过高'
- alert: 打开文件描述符数大于系统设定的大小
expr: zk_open_file_descriptor_count > zk_max_file_descriptor_count * 0.85
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} "
description: '打开文件描述符数大于系统设定的大小'
- alert: zookeeper服务器宕机
expr: zk_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} "
description: 'zookeeper服务器宕机'
- alert: zk主节点丢失
expr: absent(zk_server_state{state="leader"}) != 1
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} "
description: 'zk主节点丢失'
上一篇
下一篇
linux 权限 777
Prometheus Pushgateway
flink1.18.1 pushgateway prometheus监控
grafana Dashboard SpringBoot2 micrometer-prometheus
mysql监控 mysqld_exporter
SkyWalking 慢sql 数据获取 ,graphQL 接口 例子