Loading
close

告警消息

time 更新时间:2024-08-20 19:36:26

可观测服务提供实时和历史告警消息的查询,支持获取平台/项目的告警,用于异常情况下的故障分析和第三方告警系统对接等场景。

前提条件

  • 签名机制使用 Token 认证,需提供项目用户身份验证生成的 Project 级别的 Token;
  • 服务地址: emla.openstack.svc.cluster.local (示例使用默认根域名 openstack.svc.cluster.local)。

告警消息查询

该接口对应告警消息页面的实时和历史告警消息,提供告警内容、状态、级别、分类和组件、项目和部门、分组和规则等信息,可查询整个平台或某项目的告警,并支持分类(数字原生引擎/云产品/用户负载)、状态(告警中/已屏蔽/已恢复)、级别(严重/警告/信息)和时间等粒度的条件过滤。

URI

GET /apis/alerting/v1/projects/<project_id>/alerts

请求参数

名称 输入 类型 是否必选 描述
project_id Path string 项目id
all_tenants Query boolean 是否获取所有项目的告警消息
categories Query string 分类(可选值:数字原生引擎 platform、云产品 cloudproduct、用户负载 userload)
states Query string 状态(可选值:告警中 firing、已屏蔽 silenced、已恢复 resolved)
severities Query string 级别(可选值:严重 critical、警告 warning、信息 info)
start Query unix_timestamp 开始时间
end Query unix_timestamp 结束时间
  • 上述可选参数若不指定,默认返回当前所有;
  • all_tenants 仅供云管理员的admin项目使用,可查询整个平台的告警消息;
  • categories、states和severities参数支持使用逗号组合多值,如 states=firing,silenced 查询实时告警消息;
  • start和end可指定查询存在告警消息的时间范围,start需小于end。

请求示例

  • 云管理员查询平台数字原生引擎和用户负载分类的实时告警消息

    curl -H 'X-Auth-Token: gAAAAABl7ng2_pQQGur8_EMHlV2rw2pBx_xn7FOXa4BncLHwouEKmC55Aqqavq8puUgjiIoPqp7GFRMz4qP7mhnHqA7VSh3dAp7PXb3dEe3IPgM51b-T2gCczrM4UYkS3qGtXiJBG7M8TE_Ti9qVT6tghF5fb_kQlg' 'http://emla.openstack.svc.cluster.local/apis/alerting/v1/projects/1100e312c9df4567a23806000ebee655/alerts?all_tenants=true&categories=platform,userload&states=firing,silenced'
  • 普通用户查询某项目在 2024-03-01 00:00:00~2024-03-10 00:00:00 期间存在的历史告警消息

    curl -H 'X-Auth-Token: gAAAAABl7q6QHukMZINDI4At_LRwXQ7gSTdERzKvcDNmRD7187vXwlGRXCqoSkzTvpkhxbu_r2VNejLky8CWy0e9Wgu8-MqseVEyIf3F9JL2eWIeFiZQdqSQATQ-wo1fd3qEO_kISuJyefoDL5JhEPzfSEF1_4RFwQ' 'http://emla.openstack.svc.cluster.local/apis/alerting/v1/projects/562a6eea71eb40e0be7d79d4d87ce94a/alerts?states=resolved&start=1709222400&end=1710000000'

响应参数

名称 类型 描述
code int 状态码
error string 错误信息
data.statistics dict 告警消息统计信息
data.statistics.total int 消息总数
data.statistics.critical int 严重级别数量
data.statistics.warning int 警告级别数量
data.statistics.info int 信息级别数量
data.items list 告警消息列表
data.items[$i].id string 消息id
data.items[$i].alertNameCN string 消息名称-中文
data.items[$i].alertNameEN string 消息名称-英文
data.items[$i].status string 状态
data.items[$i].severity string 级别
data.items[$i].startsAt string 开始时间
data.items[$i].updatedAt string 更新时间
data.items[$i].endsAt string 结束时间
data.items[$i].fingerprint string 消息标识
data.items[$i].silencedBy list silence id列表
data.items[$i].silenceStartsAt string 屏蔽开始时间
data.items[$i].silenceEndsAt string 屏蔽结束时间
data.items[$i].silencedByRule boolean 是否屏蔽过告警规则
data.items[$i].domainID string 所属部门id(仅云管权限存在)
data.items[$i].domainName string 所属部门名称(仅云管权限存在)
data.items[$i].projectID string 所属项目id(仅云管权限存在)
data.items[$i].projectName string 所属项目名称(仅云管权限存在)
data.items[$i].group dict 所属告警分组信息
data.items[$i].group.groupID string 分组id
data.items[$i].group.groupName string 分组名称
data.items[$i].rule dict 所属告警规则信息
data.items[$i].rule.ruleID string 规则id
data.items[$i].rule.ruleNameCN string 规则名称-中文
data.items[$i].rule.ruleNameEN string 规则名称-英文
data.items[$i].category string 分类
data.items[$i].component string 组件
data.items[$i].labels dict 标签字典
data.items[$i].annotations dict 注释
data.items[$i].annotations.summaryCN string 告警详情-中文
data.items[$i].annotations.summaryEN string 告警详情-英文
data.items[$i].annotations.descriptionCN string 告警概述-中文
data.items[$i].annotations.descriptionEN string 告警概述-英文
data.items[$i].annotations.solutionCN string 解决方案-中文
data.items[$i].annotations.solutionEN string 解决方案-英文
data.items[$i].annotations.expr string 监控数据查询表达式
data.items[$i].annotations.legendFormat string 监控数据图例
data.items[$i].annotations.thresholds string 监控数据阈值
data.items[$i].annotations.unit string 监控数据单位

响应示例

{
    "code": 200,
    "error": "",
    "data": {
        "statistics": {
            "critical": 0,
            "info": 0,
            "total": 2,
            "warning": 2
        },
        "items": [
            {
                "id": "e66f94883c9abfa64e30aea27e44bcd1",
                "alertNameCN": "节点内存使用率大于70%",
                "alertNameEN": "The memory utilization of a node is greater than 70%",
                "status": "firing",
                "severity": "warning",
                "startsAt": "2024-03-10T12:06:43.791Z",
                "endsAt": "0001-01-01T00:00:00.000Z",
                "updatedAt": "2024-03-11T02:53:43.904Z",
                "fingerprint": "f31ae4758d642870",
                "labels": {
                    "alertname": "节点内存使用率大于70%",
                    "category": "platform",
                    "company": "nanjing_3_12",
                    "group_id": "adfbede9fe42a3c1d3aaab12e78af4be",
                    "host_ip": "10.10.1.5",
                    "node_name": "node-2",
                    "project": "nanjing_3_12",
                    "public_vip": "100.100.4.10",
                    "role": "controller_all",
                    "rule_id": "f0dc35909cecd9d17ee9a127915c2308",
                    "rule_ns": "openstack",
                    "rule_resource": "escl.rules",
                    "severity": "warning",
                    "state": "disabled"
                },
                "annotations": {
                    "descriptionCN": "节点 node-2:10.10.1.5 内存使用率大于70%且小于90%,持续5分钟告警。",
                    "descriptionEN": "node-2:10.10.1.5 - The memory utilization of this node is greater than 70% and less than 90%, and this situation continues for 5 minutes.",
                    "solutionCN": "请降低您的云主机业务负载、迁移云主机到其他节点,或计划扩容云环境。",
                    "solutionEN": "Please lower the workload of your instances, or migrate instances in this node to other nodes, or plan expansion of this cloud platform.",
                    "summaryCN": "节点 node-2:10.10.1.5 内存使用率大于70%,其中云主机内存使用率为0.00%。",
                    "summaryEN": "node-2:10.10.1.5 - The memory utilization of this node is greater than 70%, including the memory utilization of instances of this node is 0.00%.",
                    "expr": "node_instance_memory_utilization * on(host_ip,node_name) group_left(role) ecms_node_role{role=~\"controller_all|compute_osd|compute\"} * on(host_ip,node_name) group_left(state) ecms_node_dpdk_state{state=\"disabled\"} * on(host_ip,node_name) group_left() count by(node_name, host_ip)(((node_memory_MemTotal_bytes{instance=~\".+\"} - node_memory_MemFree_bytes{instance=~\".+\"} - node_memory_Buffers_bytes{instance=~\".+\"} - node_memory_Slab_bytes{instance=~\".+\"} - node_memory_Cached_bytes{instance=~\".+\"}) / node_memory_MemTotal_bytes{instance=~\".+\"} * 100))",
                    "legendFormat": "<node_name> memory utilization",
                    "thresholds": "70,yellow,dashed,Warning;90,red,dashed,Critical",
                    "unit": "%"
                },
                "group": {
                    "groupID": "adfbede9fe42a3c1d3aaab12e78af4be",
                    "groupName": "node.rules"
                },
                "rule": {
                    "ruleID": "f0dc35909cecd9d17ee9a127915c2308",
                    "ruleNameCN": "节点内存使用率大于70%",
                    "ruleNameEN": "The memory utilization of a node is greater than 70%"
                },
                "domainID": "default",
                "domainName": "Default",
                "projectID": "admin",
                "projectName": "admin",
                "category": "platform",
                "component": "ESCL"
            },
            {
                "id": "92dcb7c537c3fb8b5a354726212cd3cd",
                "alertNameCN": "Etcd磁盘同步持续时间过长",
                "alertNameEN": "Etcd disk fync duration is too long",
                "status": "resolved",
                "severity": "warning",
                "startsAt": "2024-02-15T18:22:00.551Z",
                "endsAt": "2024-02-15T18:26:00.551Z",
                "updatedAt": "2024-02-15T18:26:00.551Z",
                "fingerprint": "e054585f3994c467",
                "labels": {
                    "alertname": "Etcd磁盘同步持续时间过长",
                    "category": "platform",
                    "company": "nanjing_3_12",
                    "endpoint": "metrics",
                    "group_id": "d6e557c8abe593ee4226930dad94403d",
                    "host_ip": "10.10.1.4",
                    "instance": "10.10.1.4:2379",
                    "job": "etcd",
                    "namespace": "kube-system",
                    "node_name": "node-1",
                    "project": "nanjing_3_12",
                    "public_vip": "100.100.4.10",
                    "rule_id": "5441717e39309f2a5de057e97d408233",
                    "rule_ns": "openstack",
                    "rule_resource": "eks-managed.rules",
                    "service": "etcd",
                    "severity": "warning"
                },
                "annotations": {
                    "descriptionCN": "节点 node-1:10.10.1.4 Etcd磁盘WAL同步持续时间过长,磁盘IO性能不足,持续10分钟告警。",
		    "descriptionEN": "node-1:10.10.1.4 - Etcd disk WAL fsync duration is too long and disk IO performance is insufficient, and this situation continues for 10 minutes.",
                    "solutionCN": "请联系您的软件服务提供商,进行问题排查。",
                    "solutionEN": "Please contact your software service provider for problem checking.",
                    "summaryCN": "节点 node-1:10.10.1.4 Etcd磁盘WAL同步持续时间过长,磁盘IO性能不足,当前99%的持续时间为452ms。",
                    "summaryEN": "node-1:10.10.1.4 - Etcd disk WAL fsync duration is too long and disk IO performance is insufficient. The current 99th percentile fsync durations are 452ms.",
                    "expr": "histogram_quantile(0.99, rate(ecms_etcd_disk_wal_fsync_duration_seconds_bucket[5m])) * 1000",
                    "legendFormat": "<node_name> fsync duration",
                    "thresholds": "250,yellow,dashed,Too Long",
                    "unit": "ms"
                },
                "group": {
                    "groupID": "d6e557c8abe593ee4226930dad94403d",
                    "groupName": "eks-managed.rules"
                },
                "rule": {
                    "ruleID": "5441717e39309f2a5de057e97d408233",
                    "ruleNameCN": "Etcd磁盘同步持续时间过长",
                    "ruleNameEN": "Etcd disk fync duration is too long"
                },
                "domainID": "default",
                "domainName": "Default",
                "projectID": "admin",
                "projectName": "admin",
                "category": "platform",
                "component": "EKS-Managed"
            }
        ]
    }
}
此篇文章对你是否有帮助?
没帮助
locked-file

您暂无权限访问该产品