博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
gmond's XML introduction
阅读量:6910 次
发布时间:2019-06-27

本文共 17135 字,大约阅读时间需要 57 分钟。

本文讲一下从gmond监听端口导出的XML的信息, 便
于理解 : 
监听端口如果你没改的话, 默认是8649
[root@db-172-16-3-221 ganglia-web]# netstat -anp|grep gmondtcp        0      0 0.0.0.0:8649                0.0.0.0:*                   LISTEN      4462/gmond          udp        0      0 239.2.11.71:8649            0.0.0.0:*                               4462/gmond          udp        0      0 172.16.3.221:28390          239.2.11.71:8649            ESTABLISHED 4462/gmond
使用telnet导出xml的信息
[root@db-172-16-3-221 ganglia-web]# telnet 127.0.0.1 8649Trying 127.0.0.1...Connected to 127.0.0.1.Escape character is '^]'.
# 文档结构
]>
# 集群信息, 对应gmond.conf里面的cluster章节的配置.
# 主机信息, 对应gmond.conf里面的host和globals章节的配置. # 解释一下TN, TMAX, DMAX# TN是一个变量, 等于最后一次head time距离现在的时间(秒). 参见本文末尾代码的解释. 所以每次telnet得到的TN值是变的.# TMAX对应配置host_tmax, # DMAX对应配置host_dmax, # 含义如下 : # 取自man gmond.conf The host_dmax value is an integer with units in seconds. When set to zero (0), gmond will never delete a host from its list even when a remote host has stopped reporting. If host_dmax is set to a positive number then gmond will flush a host after it has not heard from it for host_dmax seconds. By the way, dmax means "delete max". The host_tmax value is an integer with units in seconds. This value represents the maximum amount of time that gmond should wait between updates from a host. As messages may get lost in the network, gmond will consider the host as being down if it has not received any messages from it after 4 times this value. For example, if host_tmax is set to 20, the host will appear as down after 80 seconds with no messages from it. By the way, tmax means "timeout max".# 当tn大于dmax时, 将触发delete, 在delete前, 会等待一些时间, 这个配置就是cleanup_threshold. The cleanup_threshold is the minimum amount of time before gmond will cleanup any hosts or metrics where tn > dmax a.k.a. expired data.
# 接下来开始的是gmond.conf里面对应的collection_group章节的配置. 包含所有的metric.# metric的属性有: name, value, type, 单位, tn, tmax, dmax, slope.# 首先看看gmetric的数据结构lib/gm_protocol.hstruct Ganglia_25metric { int key; char *name; int tmax; Ganglia_value_types type; char *units; char *slope; char *fmt; int msg_size; char *desc; int *metadata;};# 简单介绍一下tn, tmax, dmax, slope的含义.# 注意这里的tmax, dmax和HOST章节的host_tmax, host_dmax不一样.# TN同样是一个变量, 等于最后一次head time距离现在的时间(秒). 参见本文末尾代码的解释. 所以每次telnet得到的TN值是变的.# TMAX, 发生metric对应组内的所有metric到send channel的时间间隔, 对应collection_group中time_threshold配置.# DMAX, metric数据结构里没有这个值, 应该是XML结构的问题. 可以认为metric里的DMAX是没有一样的# slope的解释, 表示采样值存在 不增不减, 只增, 只减, 即增即减的几种风格.# 参考man gmetric -s, --slope=STRING Either zero|positive|negative|both (default=‘both’)Specifies the slope/derivative type of the metric being submitted. Thepossible values are zerofor a zero slope metric, positivefor anincrement-only metric, negativefor a decrement-only metric, andbothfor an arbitrarily changing metric. The default value is both.Using the value positivefor the slope of a new metric will causethe corresponding RRD file to be generated as a COUNTER, with deltavalues being displayed instead of the actual metric values.# 其实还有3个重要配置# collection_group里面 collect_once = yes # 表示这个组内的metric只采集一次 collect_every = 20 # 表示这个组内的metric间隔20秒采集一次.# metric内的配置 value_threshold = "1.0" # 表示当前采样值和前一次的采样值出现1.0的偏差的时候, 将组内的所有metric到send channel# 还包含分组, 描述, 抬头.
Connection closed by foreign host.
以上XML dump对应的配置文件 : 
[root@db-172-16-3-221 ganglia-web]# cat /opt/ganglia-core-3.6.0/etc/gmond.conf /* This configuration is as close to 2.5.x default behavior as possible   The values closely match ./gmond/metric.h definitions in 2.5.x */globals {  daemonize = yes  setuid = yes  user = nobody  debug_level = 0  max_udp_msg_len = 1472  mute = no  deaf = no  allow_extra_data = yes  host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */  host_tmax = 20 /*secs */  cleanup_threshold = 300 /*secs */  gexec = no  # By default gmond will use reverse DNS resolution when displaying your hostname  # Uncommeting following value will override that value.  # override_hostname = "mywebserver.domain.com"  # If you are not using multicast this value should be set to something other than 0.  # Otherwise if you restart aggregator gmond you will get empty graphs. 60 seconds is reasonable  send_metadata_interval = 0 /*secs */}/* * The cluster attributes specified will be used as part of the 
* tag that will wrap all hosts collected by this instance. */cluster { name = "test" owner = "digoal" latlong = "111 122" url = "http://dba.sky-mobi.com"}/* The host section describes attributes of the host, like the location */host { location = "1,2,3"}/* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */udp_send_channel { #bind_hostname = yes # Highly recommended, soon to be default. # This option tells gmond to use a source address # that resolves to the machine's hostname. Without # this, the metrics may appear to come from any # interface and the DNS names associated with # those IPs will be used to create the RRDs. mcast_join = 239.2.11.71 port = 8649 ttl = 10}/* You can specify as many udp_recv_channels as you like as well. */udp_recv_channel { mcast_join = 239.2.11.71 port = 8649 bind = 239.2.11.71 retry_bind = true # Size of the UDP buffer. If you are handling lots of metrics you really # should bump it up to e.g. 10MB or even higher. buffer = 10485760}/* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */tcp_accept_channel { port = 8649 # If you want to gzip XML output gzip_output = no}/* Channel to receive sFlow datagrams */#udp_recv_channel {# port = 6343#}/* Optional sFlow settings */#sflow {# udp_port = 6343# accept_vm_metrics = yes# accept_jvm_metrics = yes# multiple_jvm_instances = no# accept_http_metrics = yes# multiple_http_instances = no# accept_memcache_metrics = yes# multiple_memcache_instances = no#}/* Each metrics module that is referenced by gmond must be specified and loaded. If the module has been statically linked with gmond, it does not require a load path. However all dynamically loadable modules must include a load path. */modules { module { name = "core_metrics" } module { name = "cpu_module" path = "modcpu.so" } module { name = "disk_module" path = "moddisk.so" } module { name = "load_module" path = "modload.so" } module { name = "mem_module" path = "modmem.so" } module { name = "net_module" path = "modnet.so" } module { name = "proc_module" path = "modproc.so" } module { name = "sys_module" path = "modsys.so" }}/* The old internal 2.5.x metric array has been replaced by the following collection_group directives. What follows is the default behavior for collecting and sending metrics that is as close to 2.5.x behavior as possible. *//* This collection group will cause a heartbeat (or beacon) to be sent every 20 seconds. In the heartbeat is the GMOND_STARTED data which expresses the age of the running gmond. */collection_group { collect_once = yes time_threshold = 20 metric { name = "heartbeat" }}/* This collection group will send general info about this host every 1200 secs. This information doesn't change between reboots and is only collected once. */collection_group { collect_once = yes time_threshold = 1200 metric { name = "cpu_num" title = "CPU Count" } metric { name = "cpu_speed" title = "CPU Speed" } metric { name = "mem_total" title = "Memory Total" } /* Should this be here? Swap can be added/removed between reboots. */ metric { name = "swap_total" title = "Swap Space Total" } metric { name = "boottime" title = "Last Boot Time" } metric { name = "machine_type" title = "Machine Type" } metric { name = "os_name" title = "Operating System" } metric { name = "os_release" title = "Operating System Release" } metric { name = "location" title = "Location" }}/* This collection group will send the status of gexecd for this host every 300 secs.*//* Unlike 2.5.x the default behavior is to report gexecd OFF. */collection_group { collect_once = yes time_threshold = 300 metric { name = "gexec" title = "Gexec Status" }}/* This collection group will collect the CPU status info every 20 secs. The time threshold is set to 90 seconds. In honesty, this time_threshold could be set significantly higher to reduce unneccessary network chatter. */collection_group { collect_every = 20 time_threshold = 90 /* CPU status */ metric { name = "cpu_user" value_threshold = "1.0" title = "CPU User" } metric { name = "cpu_system" value_threshold = "1.0" title = "CPU System" } metric { name = "cpu_idle" value_threshold = "5.0" title = "CPU Idle" } metric { name = "cpu_nice" value_threshold = "1.0" title = "CPU Nice" } metric { name = "cpu_aidle" value_threshold = "5.0" title = "CPU aidle" } metric { name = "cpu_wio" value_threshold = "1.0" title = "CPU wio" } metric { name = "cpu_steal" value_threshold = "1.0" title = "CPU steal" } /* The next two metrics are optional if you want more detail... ... since they are accounted for in cpu_system. metric { name = "cpu_intr" value_threshold = "1.0" title = "CPU intr" } metric { name = "cpu_sintr" value_threshold = "1.0" title = "CPU sintr" } */}collection_group { collect_every = 20 time_threshold = 90 /* Load Averages */ metric { name = "load_one" value_threshold = "1.0" title = "One Minute Load Average" } metric { name = "load_five" value_threshold = "1.0" title = "Five Minute Load Average" } metric { name = "load_fifteen" value_threshold = "1.0" title = "Fifteen Minute Load Average" }}/* This group collects the number of running and total processes */collection_group { collect_every = 80 time_threshold = 950 metric { name = "proc_run" value_threshold = "1.0" title = "Total Running Processes" } metric { name = "proc_total" value_threshold = "1.0" title = "Total Processes" }}/* This collection group grabs the volatile memory metrics every 40 secs and sends them at least every 180 secs. This time_threshold can be increased significantly to reduce unneeded network traffic. */collection_group { collect_every = 40 time_threshold = 180 metric { name = "mem_free" value_threshold = "1024.0" title = "Free Memory" } metric { name = "mem_shared" value_threshold = "1024.0" title = "Shared Memory" } metric { name = "mem_buffers" value_threshold = "1024.0" title = "Memory Buffers" } metric { name = "mem_cached" value_threshold = "1024.0" title = "Cached Memory" } metric { name = "swap_free" value_threshold = "1024.0" title = "Free Swap Space" }}collection_group { collect_every = 40 time_threshold = 300 metric { name = "bytes_out" value_threshold = 4096 title = "Bytes Sent" } metric { name = "bytes_in" value_threshold = 4096 title = "Bytes Received" } metric { name = "pkts_in" value_threshold = 256 title = "Packets Received" } metric { name = "pkts_out" value_threshold = 256 title = "Packets Sent" }}/* Different than 2.5.x default since the old config made no sense */collection_group { collect_every = 1800 time_threshold = 3600 metric { name = "disk_total" value_threshold = 1.0 title = "Total Disk Space" }}collection_group { collect_every = 40 time_threshold = 180 metric { name = "disk_free" value_threshold = 1.0 title = "Disk Space Available" } metric { name = "part_max_used" value_threshold = 1.0 title = "Maximum Disk Space Used" }}include ("/opt/ganglia-core-3.6.0/etc/conf.d/*.conf")
关于主机 TN的代码如下 : 
gmond.cprint_host_start( apr_socket_t *client, Ganglia_host *hostinfo){  apr_size_t len;  char hostxml[1024]; /* for 
*/ apr_time_t now = apr_time_now(); int tn = (now - hostinfo->last_heard_from) / APR_USEC_PER_SEC; len = apr_snprintf(hostxml, 1024, "
\n", hostinfo->hostname, hostinfo->ip, tags ? tags : "", (int)(hostinfo->last_heard_from / APR_USEC_PER_SEC), tn, host_tmax, host_dmax, hostinfo->location? hostinfo->location: "unspecified", hostinfo->gmond_started); return socket_send(client, hostxml, &len);}
这里面引用到的
APR_USEC_PER_SEC常量 : 
Apache Portable Runtimeapr_time.h00060 /** number of microseconds per second */00061 #define APR_USEC_PER_SEC APR_TIME_C(1000000)
注意extra_element和extra_data里面包含了组信息, 描述信息, 抬头信息.
使用
allow_extra_data attribute可以关闭
EXTRA_ELEMENT
 和 
EXTRA_DATA
 的输出.
man gmond.conf       The allow_extra_data attribute is a boolean.  When false, gmond will not send out the EXTRA_ELEMENT and       EXTRA_DATA parts of the XML.  This might be useful if you are using your own frontend to the metric data and       will like to save some bandwith.       The host_dmax value is an integer with units in seconds.  When set to zero (0), gmond will never delete a host       from its list even when a remote host has stopped reporting.  If host_dmax is set to a positive number then       gmond will flush a host after it has not heard from it for host_dmax seconds.  By the way, dmax means "delete       max".       The host_tmax value is an integer with units in seconds. This value represents the maximum amount of time that       gmond should wait between updates from a host. As messages may get lost in the network, gmond will consider the       host as being down if it has not received any messages from it after 4 times this value. For example, if       host_tmax is set to 20, the host will appear as down after 80 seconds with no messages from it. By the way,       tmax means "timeout max".       The cleanup_threshold is the minimum amount of time before gmond will cleanup any hosts or metrics where tn >       dmax a.k.a. expired data.
[参考]
1. gmond.c
2. man gmond.conf
3. man gmetric

转载地址:http://lobcl.baihongyu.com/

你可能感兴趣的文章
[1480]数据结构实验:哈希表 sdutOJ
查看>>
Entity Framework的启动速度优化
查看>>
C# .NET Socket封装
查看>>
SQLServer随机取记录
查看>>
Python数据结构与算法
查看>>
vim插件ctags的安装和使用
查看>>
C语言面试题汇总之一
查看>>
linux重新设置密码,亲试成功
查看>>
无法创建链接服务器 "xxx" 的 OLE DB 访问接口 "OraOLEDB.Oracle" 的实例。 (Microsoft SQL Server,错误: 7302)...
查看>>
vue去除地址栏上的'#'号
查看>>
[转]Linux下which、whereis、locate、find 命令的区别
查看>>
BZOJ4566:[HAOI2016]找相同字符——题解
查看>>
IIS 5 与IIS 6 原理介绍
查看>>
【总结整理】AMAP学习AMAP.PlaceSearch()
查看>>
c# webbrowser.documentstream保存html文件 解决gb2312编码 存下后出现乱码的问题
查看>>
一个IMAGE BUTTON
查看>>
otter跳过部分binlog,规避找不到binlog的问题
查看>>
CGLib与JDKProxy的区别
查看>>
CSDN博客投票活动开始了
查看>>
Android屏幕元素层次结构
查看>>