使用 windows_exporter 可以非常方便地给 prometheus 增加监控 windows server 的能力。

通常情况下只需使用默认配置就可以监控 CPU,内存,网络,服务了。但某些场合,如服务器安装了安全狗,在某些配置下可能不能获取某些服务的状态,此时就需要自定义配置,比如只监控某些服务。

windows_exporter 配置说明

来源 

https://github.com/prometheus-community/windows_exporter


说明

适用于 Windows 机器的 Prometheus 导出器。


兼容性

windows_exporter 支持 Windows Server 版本 2008R2 和更高版本,以及桌面 Windows 版本 7 和更高版本。


部署方式

下载exporter:

https://github.com/prometheus-community/windows_exporter/releases/download/v0.16.0/windows_exporter-0.16.0-amd64.exe


可直接执行.exe文件,也可自定义方式启动,直接启动将使用默认配置:


自定义配置

Flags:
  -h, --help                     Show context-sensitive help (also try
                                 --help-long and --help-man).
      --collectors.dfsr.sources-enabled="connection,folder,volume"
                                 Comma-seperated list of DFSR Perflib sources to
                                 use.
      --collectors.exchange.list
                                 List the collectors along with their perflib
                                 object name/ids
      --collectors.exchange.enabled=""
                                 Comma-separated list of collectors to use.
                                 Defaults to all, if not specified.
      --collector.iis.site-whitelist=".+"
                                 Regexp of sites to whitelist. Site name must
                                 both match whitelist and not match blacklist to
                                 be included.
      --collector.iis.site-blacklist=COLLECTOR.IIS.SITE-BLACKLIST
                                 Regexp of sites to blacklist. Site name must
                                 both match whitelist and not match blacklist to
                                 be included.
      --collector.iis.app-whitelist=".+"
                                 Regexp of apps to whitelist. App name must both
                                 match whitelist and not match blacklist to be
                                 included.
      --collector.iis.app-blacklist=COLLECTOR.IIS.APP-BLACKLIST
                                 Regexp of apps to blacklist. App name must both
                                 match whitelist and not match blacklist to be
                                 included.
      --collector.logical_disk.volume-whitelist=".+"
                                 Regexp of volumes to whitelist. Volume name
                                 must both match whitelist and not match
                                 blacklist to be included.
      --collector.logical_disk.volume-blacklist=""
                                 Regexp of volumes to blacklist. Volume name
                                 must both match whitelist and not match
                                 blacklist to be included.
      --collector.msmq.msmq-where=COLLECTOR.MSMQ.MSMQ-WHERE
                                 WQL 'where' clause to use in WMI metrics query.
                                 Limits the response to the msmqs you specify
                                 and reduces the size of the response.
      --collectors.mssql.classes-enabled="accessmethods,availreplica,bufman,databases,dbreplica,genstats,locks,memmgr,sqlstats,sqlerrors,transactions"
                                 Comma-separated list of mssql WMI classes to
                                 use.
      --collectors.mssql.class-print
                                 If true, print available mssql WMI classes and
                                 exit. Only displays if the mssql collector is
                                 enabled.
      --collector.net.nic-whitelist=".+"
                                 Regexp of NIC:s to whitelist. NIC name must
                                 both match whitelist and not match blacklist to
                                 be included.
      --collector.net.nic-blacklist=""
                                 Regexp of NIC:s to blacklist. NIC name must
                                 both match whitelist and not match blacklist to
                                 be included.
      --collector.process.whitelist=".*"
                                 Regexp of processes to include. Process name
                                 must both match whitelist and not match
                                 blacklist to be included.
      --collector.process.blacklist=""
                                 Regexp of processes to exclude. Process name
                                 must both match whitelist and not match
                                 blacklist to be included.
      --collector.service.services-where=""
                                 WQL 'where' clause to use in WMI metrics query.
                                 Limits the response to the services you specify
                                 and reduces the size of the response.
      --collector.smtp.server-whitelist=".+"
                                 Regexp of virtual servers to whitelist. Server
                                 name must both match whitelist and not match
                                 blacklist to be included.
      --collector.smtp.server-blacklist=COLLECTOR.SMTP.SERVER-BLACKLIST
                                 Regexp of virtual servers to blacklist. Server
                                 name must both match whitelist and not match
                                 blacklist to be included.
      --collector.textfile.directory="C:\\Program Files\\windows_exporter\\textfile_inputs"
                                 Directory to read text files with metrics from.
      --config.file=CONFIG.FILE  YAML configuration file to use. Values set in
                                 this file will be overriden by CLI flags.
      --web.config.file=""       [EXPERIMENTAL] Path to configuration file that
                                 can enable TLS or authentication.
      --telemetry.addr=":9182"   host:port for exporter.
      --telemetry.path="/metrics"
                                 URL path for surfacing collected metrics.
      --telemetry.max-requests=5
                                 Maximum number of concurrent requests. 0 to
                                 disable.
      --collectors.enabled="cpu,cs,logical_disk,net,os,service,system,textfile"
                                 Comma-separated list of collectors to use. Use
                                 '[defaults]' as a placeholder for all the
                                 collectors enabled by default.
      --collectors.print         If true, print available collectors and exit.
      --scrape.timeout-margin=0.5
                                 Seconds to subtract from the timeout allowed by
                                 the client. Tune to allow for overhead or high
                                 loads.
      --log.level="info"         Only log messages with the given severity or
                                 above. Valid levels: [debug, info, warn, error,
                                 fatal]
      --log.format="logger:stderr"
                                 Set the log target and format. Example:
                                 "logger:syslog?appname=bob&local=7" or
                                 "logger:stdout?json=true"
      --version                  Show application version.

使用配置文件

可以使用–config.file标志指定 YAML 配置文件。例如

.\windows_exporter.exe --config.file=config.yml

config.yml格式如下,可根据配置文档进行内容调整:


collectors:
  enabled: cpu,cs,net,service
collector:
  service:
    services-where: "Name='windows_exporter'"
log:
  level: warn

rules配置参考

包含CPU超过90%使用量预警,内存超过90%用量预警,磁盘用量90%预警,windows_export自身预警及服务预警,如开头所说,未配置时将会监控所有服务,很多时候只需要监控特定服务即可

- name: WindowsServer
  rules:      
  - alert: WindowsServerCpuUsage
    expr: 100 - (avg by (instance) (rate(windows_cpu_time_total{mode="idle"}[2m])) * 100) > 90
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: Windows Server CPU Usage (instance {{ $labels.instance }})
      description: "CPU Usage is more than 90%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: WindowsServerMemoryUsage
    expr: 100 - ((windows_os_physical_memory_free_bytes / windows_cs_physical_memory_bytes) * 100) > 90
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Windows Server memory Usage (instance {{ $labels.instance }})
      description: "Memory usage is more than 90%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: WindowsServerDiskSpaceUsage
    expr: 100.0 - 100 * ((windows_logical_disk_free_bytes / 1024 / 1024 ) / (windows_logical_disk_size_bytes / 1024 / 1024)) > 90
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: Windows Server disk Space Usage (instance {{ $labels.instance }})
      description: "Disk usage is more than 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: WindowsServerCollectorError
    expr: windows_exporter_collector_success == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: Windows Server collector Error (instance {{ $labels.instance }})
      description: "Collector {{ $labels.collector }} was not successful\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: WindowsServerServiceStatus
    expr: windows_service_status{status="ok"} != 1
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: Windows Server service Status (instance {{ $labels.instance }})
      description: "Windows Service state is not OK\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

使用prometheus能够非常简单地建立起 web 服务器集群/数据库集群监控,通过这些监控,不仅能实时监控服务器集群的状态,也能够通过这些监控信息对服务器进行优化,特别是数据库参数方面的优化,以后月萌API将分享更多相关的文章。


参考:https://blog.csdn.net/qq_43021786/article/details/118809772