prometheus 告警规则
所属分类 prometheus
浏览量 307
Collection of Prometheus alerting rules
https://github.com/samber/awesome-prometheus-alerts
https://github.com/samber/awesome-prometheus-alerts/tree/master/dist/rules
Alert thresholds depend on nature of applications.
Building an efficient and battle-tested monitoring platform takes time.
警报阈值取决于应用程序的性质,建立一个高效且久经考验的监控平台需要时间。
https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
Prometheus 告警规则 主要由PromQL 定义,用于定义告警触发条件。
当这些条件被满足后,Prometheus会触发告警通知。
告警规则包含以下五个部分:
名称(alert) 告警的名称。
触发条件(expr) PromQL表达式,用于定义告警触发条件。
持续时间(for) 告警被触发的持续时间。
标签(labels) 给告警打上标签,便于根据标签进行定位和过滤。
注解(annotations) 对告警的描述,用于提供告警的详细情况。
Prometheus 将告警信息发送给Alertmanager,Alertmanager负责进一步处理告警,并可以发送到不同的接收端,如电子邮件、短信、Pushover等。
告警的生命周期包括 Inactive (未触发)、Pending(已触发阈值但未满足持续时间)、Firing(已触发阈值并满足持续时间)等状态。
用户可以通过Prometheus的Web界面查看告警规则及其状态,也可以配置不同的通知策略,如Webhook、钉钉、微信等,以实现告警的通知
groups:
- name: zk-alert-example
rules:
- alert: ZooKeeper server is down
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} ZooKeeper server is down"
description: "{{ $labels.instance }} of job {{$labels.job}} ZooKeeper server is down: [{{ $value }}]."
- alert: create too many znodes
expr: znode_count > 1000000
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} create too many znodes"
description: "{{ $labels.instance }} of job {{$labels.job}} create too many znodes: [{{ $value }}]."
- alert: create too many connections
expr: num_alive_connections > 50 # suppose we use the default maxClientCnxns: 60
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} create too many connections"
description: "{{ $labels.instance }} of job {{$labels.job}} create too many connections: [{{ $value }}]."
Prometheus监控rabbitmq
zookeeper Monitor prometheus + grafana
Prometheus node_exporter 告警规则
上一篇
下一篇
杭州登山路线2024
zookeeper Monitor prometheus + grafana
Prometheus node_exporter 告警规则
Prometheus offset 函数 计算同比环比
Prometheus sum 和 sum_over_time
PromQL内置函数