IoT邊緣集群Kubernetes?Events告警通知實(shí)現(xiàn)示例
背景
邊緣集群(基于 樹莓派 + K3S) 需要實(shí)現(xiàn)基本的告警功能。
邊緣集群限制
CPU/內(nèi)存/存儲(chǔ) 資源緊張,無法支撐至少需要 2GB 以上內(nèi)存和大量存儲(chǔ)的基于 Prometheus 的完整監(jiān)控體系方案(即使是基于 Prometheus Agent, 也無法支撐) (需要避免額外的存儲(chǔ)和計(jì)算資源消耗)
網(wǎng)絡(luò)條件,無法支撐監(jiān)控體系,因?yàn)楸O(jiān)控體系一般都需要每 1min 定時(shí)(或每時(shí)每刻)傳輸數(shù)據(jù),且數(shù)據(jù)量不?。?/p>
存在 5G 收費(fèi)網(wǎng)絡(luò)的情況,且訪問的目的端地址需要開通權(quán)限,且按照流量收費(fèi),且因?yàn)?5G 網(wǎng)絡(luò)條件,網(wǎng)絡(luò)傳輸能力受限,且不穩(wěn)定(可能會(huì)在一段時(shí)間內(nèi)離線);
關(guān)鍵需求
總結(jié)下來,關(guān)鍵需求如下:
- 實(shí)現(xiàn)對(duì)邊緣集群異常的及時(shí)告警,需要知道邊緣集群正在發(fā)生的異常情況;
- 網(wǎng)絡(luò):網(wǎng)絡(luò)條件情況較差,網(wǎng)絡(luò)流量少,只只能開通極少數(shù)目的端地址,可以容忍網(wǎng)絡(luò)不穩(wěn)定(一段時(shí)間內(nèi)離線)的情況;
- 資源:需要盡量避免額外的存儲(chǔ)和計(jì)算資源消耗
方案
綜上所訴,采用如下方案實(shí)現(xiàn):
基于 Kubernetes Events 的告警通知
架構(gòu)圖
技術(shù)方案規(guī)劃
- 從 Kubernetes 的各項(xiàng)資源收集 Events, 如:
pod
node
kubelet
crd
...
- 通過 kubernetes-event-exporter 組件來實(shí)現(xiàn)對(duì) Kubernetes Events 的收集;
- 只篩選
Warning
級(jí)別 Events 供告警通知(后續(xù),條件可以進(jìn)一步定義) - 告警通過 飛書 webhook 等通信工具進(jìn)行發(fā)送(后續(xù),發(fā)送渠道可以增加)
實(shí)施步驟
手動(dòng)方式:
在邊緣集群上,執(zhí)行如下操作:
1. 創(chuàng)建 roles
如下:
cat << _EOF_ | kubectl apply -f - --- apiVersion: v1 kind: Namespace metadata: name: monitoring --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: event-exporter-extra rules: - apiGroups: - "" resources: - nodes verbs: - get - list - watch --- apiVersion: v1 kind: ServiceAccount metadata: namespace: monitoring name: event-exporter --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: event-exporter roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: view subjects: - kind: ServiceAccount namespace: monitoring name: event-exporter --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: event-exporter-extra roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: event-exporter-extra subjects: - kind: ServiceAccount namespace: kube-event-export name: event-exporter _EOF_
2. 創(chuàng)建 kubernetes-event-exporter config
如下:
cat << _EOF_ | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: name: event-exporter-cfg namespace: monitoring data: config.yaml: | logLevel: error logFormat: json route: routes: - match: - receiver: "dump" - drop: - type: "Normal" match: - receiver: "feishu" receivers: - name: "dump" stdout: {} - name: "feishu" webhook: endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..." headers: Content-Type: application/json layout: msg_type: interactive card: config: wide_screen_mode: true enable_forward: true header: title: tag: plain_text content: XXX IoT K3S 集群告警 template: red elements: - tag: div text: tag: lark_md content: "**EventType:** {{ .Type }}\n**EventKind:** {{ .InvolvedObject.Kind }}\n**EventReason:** {{ .Reason }}\n**EventTime:** {{ .LastTimestamp }}\n**EventMessage:** {{ .Message }}" _EOF_
?? 注意:
endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..."
按需修改為對(duì)應(yīng)的 webhook endpoint, ?切記勿對(duì)外公布!!!content: XXX IoT K3S 集群告警
: 按需調(diào)整為方便快速識(shí)別的名稱,如:"家里測(cè)試 K3S 集群告警"
3. 創(chuàng)建 Deployment
cat << _EOF_ | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: event-exporter namespace: monitoring spec: replicas: 1 selector: matchLabels: app: event-exporter version: v1 template: metadata: labels: app: event-exporter version: v1 spec: volumes: - name: cfg configMap: name: event-exporter-cfg defaultMode: 420 - name: localtime hostPath: path: /etc/localtime type: '' - name: zoneinfo hostPath: path: /usr/share/zoneinfo type: '' containers: - name: event-exporter image: ghcr.io/opsgenie/kubernetes-event-exporter:v0.11 args: - '-conf=/data/config.yaml' env: - name: TZ value: Asia/Shanghai volumeMounts: - name: cfg mountPath: /data - name: localtime readOnly: true mountPath: /etc/localtime - name: zoneinfo readOnly: true mountPath: /usr/share/zoneinfo imagePullPolicy: IfNotPresent serviceAccount: event-exporter affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/controlplane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/control-plane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/master operator: In values: - 'true' tolerations: - key: node-role.kubernetes.io/controlplane value: 'true' effect: NoSchedule - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule _EOF_
?? 說明:
event-exporter-cfg
相關(guān)配置,是用于加載以 ConfigMap 形式保存的配置文件;localtime
zoneinfo
TZ
相關(guān)配置,是用于修改該 pod 的時(shí)區(qū)為Asia/Shanghai
, 以使得最終顯示的通知效果為 CST 時(shí)區(qū);affinity
tolerations
相關(guān)配置,是為了確保:無論如何,優(yōu)先調(diào)度到 master node 上去,按需調(diào)整,此處是因?yàn)?master 往往在邊緣集群中作為網(wǎng)關(guān)存在,配置較高,且在線時(shí)間較長;
自動(dòng)化部署
效果:安裝 K3S 時(shí)就自動(dòng)部署
在 K3S server 所在節(jié)點(diǎn),/var/lib/rancher/k3s/server/manifests/
目錄(如果沒有該目錄就先創(chuàng)建)下,創(chuàng)建 event-exporter.yaml
--- apiVersion: v1 kind: Namespace metadata: name: monitoring --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: event-exporter-extra rules: - apiGroups: - "" resources: - nodes verbs: - get - list - watch --- apiVersion: v1 kind: ServiceAccount metadata: namespace: monitoring name: event-exporter --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: event-exporter roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: view subjects: - kind: ServiceAccount namespace: monitoring name: event-exporter --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: event-exporter-extra roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: event-exporter-extra subjects: - kind: ServiceAccount namespace: kube-event-export name: event-exporter --- apiVersion: v1 kind: ConfigMap metadata: name: event-exporter-cfg namespace: monitoring data: config.yaml: | logLevel: error logFormat: json route: routes: - match: - receiver: "dump" - drop: - type: "Normal" match: - receiver: "feishu" receivers: - name: "dump" stdout: {} - name: "feishu" webhook: endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/dc4fd384-996b-4d20-87cf-45b3518869ec" headers: Content-Type: application/json layout: msg_type: interactive card: config: wide_screen_mode: true enable_forward: true header: title: tag: plain_text content: xxxK3S集群告警 template: red elements: - tag: div text: tag: lark_md content: "**EventType:** {{ .Type }}\n**EventKind:** {{ .InvolvedObject.Kind }}\n**EventReason:** {{ .Reason }}\n**EventTime:** {{ .LastTimestamp }}\n**EventMessage:** {{ .Message }}" --- apiVersion: apps/v1 kind: Deployment metadata: name: event-exporter namespace: monitoring spec: replicas: 1 selector: matchLabels: app: event-exporter version: v1 template: metadata: labels: app: event-exporter version: v1 spec: volumes: - name: cfg configMap: name: event-exporter-cfg defaultMode: 420 - name: localtime hostPath: path: /etc/localtime type: '' - name: zoneinfo hostPath: path: /usr/share/zoneinfo type: '' containers: - name: event-exporter image: ghcr.io/opsgenie/kubernetes-event-exporter:v0.11 args: - '-conf=/data/config.yaml' env: - name: TZ value: Asia/Shanghai volumeMounts: - name: cfg mountPath: /data - name: localtime readOnly: true mountPath: /etc/localtime - name: zoneinfo readOnly: true mountPath: /usr/share/zoneinfo imagePullPolicy: IfNotPresent serviceAccount: event-exporter affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/controlplane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/control-plane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/master operator: In values: - 'true' tolerations: - key: node-role.kubernetes.io/controlplane value: 'true' effect: NoSchedule - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule
之后啟動(dòng) K3S 就會(huì)自動(dòng)部署。
???Reference: 自動(dòng)部署 manifests 和 Helm charts | Rancher 文檔
最終效果
如下圖:
???參考文檔
- opsgenie/kubernetes-event-exporter: Export Kubernetes events to multiple destinations with routing and filtering (github.com)
- AliyunContainerService/kube-eventer: kube-eventer emit kubernetes events to sinks (github.com)
- kubesphere/kube-events: K8s Event Exporting, Filtering and Alerting in Multi-Tenant Environment (github.com)
- kubesphere/notification-manager: K8s native notification management with multi-tenancy support (github.com)
以上就是IoT邊緣集群Kubernetes Events告警通知實(shí)現(xiàn)示例的詳細(xì)內(nèi)容,更多關(guān)于IoT集群Kubernetes Events告警的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章!
- 詳解Kubernetes 中容器跨主機(jī)網(wǎng)絡(luò)
- Kubernetes?Ingress實(shí)現(xiàn)細(xì)粒度IP訪問控制
- Kubernetes如何限制不同團(tuán)隊(duì)只能訪問各自namespace實(shí)現(xiàn)
- 詳解Rainbond云原生平臺(tái)簡化Kubernetes業(yè)務(wù)問題排查
- 一文解析Kubernetes使用PVC后數(shù)據(jù)丟失
- Kubernetes上使用Jaeger分布式追蹤基礎(chǔ)設(shè)施詳解
- IoT?邊緣集群Kubernetes?Events告警通知進(jìn)一步配置詳解
- kubernetes之statefulset搭建MySQL集群
相關(guān)文章
Rainbond對(duì)前端項(xiàng)目Vue及React的持續(xù)部署
這篇文章主要為大家介紹了Rainbond對(duì)前端項(xiàng)目Vue及React的持續(xù)部署,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2022-04-04Rainbond網(wǎng)絡(luò)治理插件ServiceMesh官方文檔說明
這篇文章主要為大家介紹了Rainbond網(wǎng)絡(luò)治理插件ServiceMesh官方文檔說明,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2022-04-04Kubernetes實(shí)現(xiàn)CI與CD配置教程
這篇文章主要為大家介紹了基于Kubernetes實(shí)現(xiàn)CI與CD配置教程,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2022-05-05Containerd容器運(yùn)行yum安裝與二進(jìn)制安裝
這篇文章主要為大家介紹了Containerd容器運(yùn)行yum安裝與二進(jìn)制安裝,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2022-06-06kubernetes認(rèn)證鑒權(quán)內(nèi)容淺析
這篇文章主要為大家介紹了kubernetes認(rèn)證鑒權(quán)內(nèi)容淺析,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2023-04-04K8s準(zhǔn)入控制Admission?Controller深入介紹
本篇我們將聚焦于?kube-apiserver?請(qǐng)求處理過程中一個(gè)很重要的部分?--?準(zhǔn)入控制器(Admission?Controller)深入講解,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步早日升職加薪2022-04-04