首页   快速返回

linux系统常用监控指标
文章分类 linux
发布时间 2019-03-31 修改时间 2019-03-31
系统监控
CPU相关采集项
计算方法:通过采集/proc/stat来得到,参考sar命令的统计输出来理解。 
==cpu.idle==:Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request. 
==cpu.busy:==与cpu.idle相对,他的值等于100减去cpu.idle。 
cpu.guest:Percentage of time spent by the CPU or CPUs to run a virtual processor. 
cpu.iowait:Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request. 
cpu.irq:Percentage of time spent by the CPU or CPUs to service hardware interrupts. 
cpu.softirq:Percentage of time spent by the CPU or CPUs to service software interrupts. 
cpu.nice:Percentage of CPU utilization that occurred while executing at the user level with nice priority. 
cpu.steal:Percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor. 
cpu.system:Percentage of CPU utilization that occurred while executing at the system level (kernel). 
cpu.user:Percentage of CPU utilization that occurred while executing at the user level (application). 
cpu.cnt:cpu核数。 
cpu.switches:cpu上下文切换次数,计数器类型。

磁盘相关采集项
计算方法:先读取/proc/mounts拿到所有挂载点,然后通过syscall.Statfs_t拿到blocks和inode的使用情况。
每个metric都会附加一组tag描述,类似mount=mount,fstype=mount,fstype=fstype,
其中mount是挂载点,比如/home,mount是挂载点,比如/home,fstype是文件系统,比如ext4。 

df.bytes.free:磁盘可用量,int64 
df.bytes.free.percent:磁盘可用量占总量的百分比,float64,比如32.1 
df.bytes.total:磁盘总大小,int64 
df.bytes.used:磁盘已用大小,int64 
df.bytes.used.percent:磁盘已用大小占总量的百分比,float64 
df.inodes.total:inode总数,int64 
df.inodes.free:可用inode数目,int64 
df.inodes.free.percent:可用inode占比,float64 
df.inodes.used:已用的inode数据,int64 
df.inodes.used.percent:已用inode占比,float64

IO相关采集项
计算方法:每秒采集一次/proc/diskstats,计算差值,都是计数器类型的。
每个metric都会有一组tag描述,形如device=$device,用来表示具体的设备,比如sda1、sdb。
可以参考iostat的帮助文档来理解具体的metric含义。

disk.io.ios_in_progress:Number of actual I/O requests currently in flight. 
disk.io.msec_read:Total number of ms spent by all reads. 
disk.io.msec_total:Amount of time during which ios_in_progress >= 1. 
disk.io.msec_weighted_total:Measure of recent I/O completion time and backlog. 
disk.io.msec_write:Total number of ms spent by all writes. 
disk.io.read_merged:Adjacent read requests merged in a single req. 
disk.io.read_requests:Total number of reads completed successfully. 
disk.io.read_sectors:Total number of sectors read successfully. 
disk.io.write_merged:Adjacent write requests merged in a single req. 
disk.io.write_requests:total number of writes completed successfully. 
disk.io.write_sectors:total number of sectors written successfully. 
disk.io.read_bytes:单位是byte的数字 
disk.io.write_bytes:单位是byte的数字 
disk.io.avgrq_sz:下面几个值就是iostat -x 1看到的值 
disk.io.avgqu-sz 
disk.io.await 
disk.io.svctm 
disk.io.util:磁盘io利用百分数,比如56.43,表示56.43%.

机器负载相关采集项
计算方法:读取/proc/loadavg,都是原始值类型的: 
load.1min 
load.5min 
load.15min

分别表示系统在过去1分钟、5分钟、15分钟内运行进程队列中的平均负载

具体来说:
0.00-1.00 之间的数字表示此时路况非常良好,没有拥堵,车辆可以毫无阻碍地通过。

1.00 表示道路还算正常,但有可能会恶化并造成拥堵。此时系统已经没有多余的资源了,管理员需要进行优化。

1.00-*** 表示路况不太好了,如果到达2.00表示有桥上车辆一倍数目的车辆正在等待。这种情况你必须进行检查了。

2、多核CPU - 多车道 - 数字/CPU核数 在0.00-1.00之间正常
多核CPU的话,满负荷状态的数字为 “1.00 * CPU核数”,即双核CPU为2.00,四核CPU为4.00。

内存相关采集项
计算方法:读取/proc/meminfo 中的内容,其中的mem.memfree是free+buffers+cached,
mem.memused=mem.memtotal-mem.memfree。可以参考free命令的输出和帮助文档来理解每个metric的含义。 

mem.memtotal:内存总大小 
mem.memused:使用了多少内存 
==mem.memused.percent:==使用的内存占比 
mem.memfree 
==mem.memfree.percent== 
mem.swaptotal:swap总大小 
mem.swapused:使用了多少swap 
==mem.swapused.percent:使用的swap的占比== 
mem.swapfree 
mem.swapfree.percent

网络相关采集项
计算方法:读取/proc/net/dev的内容,每个metric都附加有一组tag,
形如iface=$iface,标明具体那个interface,比如eth0。
metric中带有in的表示流入情况,out表示流出情况,total是总量in+out,支持的metric如下: 

net.if.in.bytes 
net.if.in.compressed 
net.if.in.dropped 
net.if.in.errors 
net.if.in.fifo.errs 
net.if.in.frame.errs 
net.if.in.multicast 
net.if.in.packets 
==net.if.out.bytes== 网卡每秒向外传输的数据量 
net.if.out.carrier.errs 
net.if.out.collisions 
net.if.out.compressed 
net.if.out.dropped 
net.if.out.errors 
net.if.out.fifo.errs 
net.if.out.packets 
==net.if.total.bytes== 网卡每秒发送和接收的数据量 
net.if.total.dropped 
net.if.total.errors 
net.if.total.packets

上一篇     下一篇
java agent 简单例子

时间序列数据库介绍

系统及应用监控工具

Linux文件操作相关知识点

linux硬链接和软链接的区别

linux文件中的inode