环境信息

Kubernetes 1.24
Vault 1.14.0

kv

Key/Value 机密引擎是一个通用的键值存储，用于在 Vault 使用的物理存储中存储任意秘密。该后端可以以两种模式之一运行 ^[1]

kv v1 - 可以将其配置为存储密钥的单个值，只有最近写入的值会被保存下来
kv v2 - 开启版本控制并存储每个键的一定数量版本的值。默认保留 10 个版本的值。

kv Version 1

Version 1 的 KV Secret Engine 相比 v2 版本，有以下限制：

不能使用 vault kv 的 metadata、patch 命令
使用 vault kv put 写入的值会覆盖之前的内容，即只保存了最后一次写入的值。

启用 version 1 的 kv 存储，没有 -version 选项时默认开启 version 1 版本的 kv：

$ vault secrets enable -version=1 kv

与其他 Secret Engine 不同，kv 机密引擎不会强制执行 TTL 过期。即使设置了 ttl，kv Secret Engine 也不会自行删除数据。 ^[1]

写入数据

$ vault kv put kv/mycorp/mydepartment/myproject/myapp/myapp-api/config db_type=mysql
Success! Data written to: kv/mycorp/mydepartment/myproject/myapp/myapp-api/config

$ vault kv put kv/mycorp/mydepartment/myproject/myapp/myapp-api/config db_host=127.0.0.1
Success! Data written to: kv/mycorp/mydepartment/myproject/myapp/myapp-api/config

$ vault kv put kv/mycorp/mydepartment/myproject/myapp/myapp-api/config db_port=3306
Success! Data written to: kv/mycorp/mydepartment/myproject/myapp/myapp-api/config

列出键

$ vault secrets list
Path          Type         Accessor              Description
----          ----         --------              -----------
cubbyhole/    cubbyhole    cubbyhole_e5c17df6    per-token private secret storage
identity/     identity     identity_f0404cf8     identity store
kv/           kv           kv_618be90b           n/a
sys/          system       system_053aea79       system endpoints used for control, policy and debugging
transit/      transit      transit_aaaaf63d      n/a


$ vault kv list kv
Keys
----
mycorp/

$ vault kv list kv/mycorp
Keys
----
mydepartment/

$ vault kv list kv/mycorp/mydepartment
Keys
----
myproject/

$ vault kv list kv/mycorp/mydepartment/myproject/myapp/myapp-api
Keys
----
config

$ vault kv list kv/mycorp/mydepartment/myproject/myapp/myapp-api/config
No value found at kv/mycorp/mydepartment/myproject/myapp/myapp-api/config

读取键值

$ vault kv get kv/mycorp/mydepartment/myproject/myapp/myapp-api/config
===== Data =====
Key        Value
---        -----
db_port    3306

以上输出中，键 kv/mycorp/mydepartment/myproject/myapp/myapp-api/config 的内容为 db_port=3306，之前写入的其他数据被覆盖，只保留有最后一个写入

删除键

$ vault kv delete kv/mycorp/mydepartment/myproject/myapp/myapp-api/config
Success! Data deleted (if it existed) at: kv/mycorp/mydepartment/myproject/myapp/myapp-api/config

$ vault kv get kv/mycorp/mydepartment/myproject/myapp/myapp-api/config
No value found at kv/mycorp/mydepartment/myproject/myapp/myapp-api/config

阅读全文 »

Policy

Vault 模拟了一个文件系统，Vault 中的所有信息，包括 Secret、配置等，都是依照各自的路径来使用和授权的。 ^[1]

使用 Vault 策略，可以使用声明式的语法来赋予或者禁止对特定路径的特定操作。Vault 策略默认情况下拒绝一切访问，所以一个空的策略不会赋予对系统的任何访问权限。 ^[1]

身份认证及授权流程

策略语法

策略使用 HCL 或者 Json 语法编写，描述了一个人或者应用程序允许访问 Vault 中的哪些路径 ^[1]

参考链接

Vault 中文参考手册-策略

脚注

1.Vault 中文参考手册-策略 ↩

helm 安装及使用

发表于 2022-10-07 更新于 2023-07-10 上层目录 Kubernetes

环境信息

centos7 5.4.212-1.el7
kubernetes Server Version: v1.25.0
Helm 3.10.0

安装

下载需要的版本

wget https://get.helm.sh/helm-v3.10.0-linux-amd64.tar.gz

解压
tar -xf helm-v3.10.0-linux-amd64.tar.gz
在解压目中找到 helm 程序，移动到需要的目录中
cp linux-amd64/helm /usr/local/bin/

验证

$ helm version
 version.BuildInfo{Version:"v3.10.0", GitCommit:"ce66412a723e4d89555dc67217607c6579ffcb21", GitTreeState:"clean", GoVersion:"go1.18.6"}

常见用法

查看已安装的 release

helm ls

helm ls -A

卸载

$ helm ls -A
NAME        	NAMESPACE    	REVISION	UPDATED                                	STATUS	CHART                APP VERSION
cert-manager	cert-manager 	1       	2022-11-01 09:57:11.373366484 +0800 CST	failed	cert-manager-v1.7.1  v1.7.1     
rancher     	cattle-system	1       	2022-11-01 10:05:07.370131566 +0800 CST	failed	rancher-2.6.9        v2.6.9     

$ helm uninstall rancher -n cattle-system
W1101 10:21:32.764269   11113 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1101 10:21:34.043445   11113 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1101 10:21:39.809766   11113 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
release "rancher" uninstalled

查看导入的 repo

$ helm repo ls
NAME          	URL                                              
rancher-stable	https://releases.rancher.com/server-charts/stable
jetstack      	https://charts.jetstack.io                       
hashicorp     	https://helm.releases.hashicorp.com

搜索/查看可用的 repo

$ helm search repo hashicorp/vault
NAME                            	CHART VERSION	APP VERSION	DESCRIPTION                          
hashicorp/vault                 	0.25.0       	1.14.0     	Official HashiCorp Vault Chart       
hashicorp/vault-secrets-operator	0.1.0        	0.1.0      	Official Vault Secrets Operator Chart

查看可用的 releases

$ helm search repo hashicorp/vault -l
NAME                            	CHART VERSION	APP VERSION	DESCRIPTION                               
hashicorp/vault                 	0.25.0       	1.14.0     	Official HashiCorp Vault Chart            
hashicorp/vault                 	0.24.1       	1.13.1     	Official HashiCorp Vault Chart            
hashicorp/vault                 	0.24.0       	1.13.1     	Official HashiCorp Vault Chart            
hashicorp/vault                 	0.23.0       	1.12.1     	Official HashiCorp Vault Chart            
hashicorp/vault                 	0.22.1       	1.12.0     	Official HashiCorp Vault Chart            
hashicorp/vault                 	0.22.0       	1.11.3     	Official HashiCorp Vault Chart

安装

$ helm install vault hashicorp/vault --version 0.25.0
NAME: vault
LAST DEPLOYED: Mon Jul 10 14:59:13 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
Thank you for installing HashiCorp Vault!

Now that you have deployed Vault, you should look over the docs on using
Vault with Kubernetes available here:

https://www.vaultproject.io/docs/


Your release is named vault. To learn more about the release, try:

  $ helm status vault
  $ helm get manifest vault

环境信息

Kubernetes 1.24
Vault 1.14.0

Vault 简介

Vault 架构及基础概念

Vault 的架构图如下 ^[1]

从以上架构图可以看到，几乎所有的 Vault 组件都被统称为 Barrier (屏障)

Vault 架构可以大体分为三个部分： ^[7]

Sotrage Backend - 存储后端
Barrier - 屏障层
HTTPS API - API 接口

常用概念

Storage Backend - Vault 自身不存储数据，因此需要一个存储后端(Storage Backend)，存储后端对 Vault 来说是不受信任的，只用来存储加密数据。 ^[8]

Initialization - Vault 在首次启动时需要初始化(Initialization)，这一步会生成一个 Master Key（加密密钥）用于加密数据，只有加密完成的数据才能保存到 Storage Backend

Unseal - Vault 启动后，因为不知道 Master Key (加密密钥)所以无法解密数据(可以访问 Storage Backend 上的数据)，这种状态被称为 Sealed(已封印)，在能解封(Unseal)数据之前，Vault 无法进行任何操作。Unseal 是获取 Master Key 明文的过程，通过 Master Key 可以解密 Encryption Key 从而可以解密存储的数据 ^[6]

Master Key - Encryption Key (用来加密存储的数据，加密密钥和加密数据被一同存储) 是被 Master Key(主密钥) 保护(加密)，必须提供 Master Key，Vault 才能解密出 Encryption Key，从而完成数据解密操作。Master Key 与其他 Vault 数据被存放在一起，但使用另一种机制进行加密：解封密钥 ，解封密钥默认使用沙米尔密钥分割算法生成 Key Shares ^[9]

Key Shares - 默认情况下，Vault 使用沙米尔密钥分割算法将 Master Key 的解封密钥分割成五个 Key Shares(分割密钥)，必须要提供其中任意的三个 Key Shares 才能重建 Master Key，以完成 Unseal(解封)操作

Key Shares(分割密钥)的总数，以及重建 Master Key(主密钥)最少需要的分割密钥数量，都是可以调整的。 沙米尔密钥分割算法 也可以关闭，这样主密钥将被直接提供给管理员，管理员可直接使用它进行解封操作。

认证系统及权限系统处理流程

在解密出 Encryption Key 后，Vault 就可以处理客户端请求了。 HTTPS API 请求进入后的整个流程都由 Vault Core 管理，Core 会强制进行 ACL 检查，并确保 Audit logging(审计日志)完成记录。

客户端首次连接 Vault 时，需要首先完成身份认证，Vault 的 Auth Method 模块有很多的身份认证方法可选

用户友好的认证方法，适合管理员使用，包括： user/password、云服务商、ldap 等，在创建用户的时候，需要为用户绑定 Policy，给予适合的权限
应用友好的方法，适合应用程序使用，包括： public/private keys、token、kubernetes、jwt 等

身份验证请求经 Core 转发给 Auth Method 进行认证，Auth Method 判定请求身份是否有效并返回关联的策略(ACL Policies)的列表。

ACL Policies 由 Policy Store 负责管理与存储，Core 负责进行 ACL 检查，ACl 的默认行为是 Deny，意味着除非明确配置 ACL Policy 允许某项操作，否则该操作将被拒绝。

在通过 Auth Method 进行认证，并返回了没有问题的 ACL Policies 后，Token Store 会生成并管理一个新的 Token，这个凭证会返回给客户端，用于客户端后续请求的身份信息。Token 都存在一个 lease(租期)。Token 关联了相关的 ACL Policies，这些策略将被用于验证请求的权限。

请求经过验证后，将被路由到 Secret Engine，如果 Secret Engine 返回了一个 secret，Core 将其注册到 Expiration Manager，并给它附件一个 Lease ID，Lease ID 被客户端用于更新(renew)或者吊销(revoke)它得到的 secret。如果客户端允许租约(lease) 到期，Expiration Manager 将自动吊销(revoke) 这个 secret

Secret Engine

Secret Engine 是保存、生成或者加密数据的组件，非常灵活。有的 Secret Engin 只是单纯的存储与读取数据，比如 kv(键值存储)就可以看作一个加密的 Redis。而其他的 Secret Engine 则可能连接到其他的服务并按需生成动态凭证等。

阅读全文 »

yum

发表于 2022-08-11 更新于 2023-07-06 上层目录 Linux ，常用命令

环境信息

CentOS Linux release 7.9.2009 (Core)
yum-3.4.3

yum 命令示例

查询指定命令来自哪个安装包

$ yum whatprovides ip
iproute-4.11.0-30.el7.x86_64 : Advanced IP routing and network device configuration tools
Repo        : base
Matched from:
Filename    : /sbin/ip

rpm 查询已安装文件来自哪个安装包

$ rpm -qf /sbin/ip
iproute-4.11.0-30.el7.x86_64

仅下载安装包而不进行安装操作

如果只下载安装包而不进行安装，可以使用 yumdownloader 命令，此命令来自安装包 yum-utils，如果不存在可以安装 yum-utils。

yumdownloader <package-name>

以上下载指定的安装包而不安装，安装包会下载到当前目录

常见错误

The GPG keys listed for the “MySQL 5.7 Community Server” repository are already installed but they are not correct for this package

解决方法

修改对应 yum 源的配置文件，将其中的配置 gpgcheck=1 改为 gpgcheck=0，以此跳过 key 验证

Prometheus 抓取 Nginx 指标

发表于 2023-06-27 更新于 2023-07-05 上层目录 Tools ， Prometheus

Prometheus 抓取 Nginx 运行时指标，主要有以下方法：

Nginx 通过自己的 stub_status 页面 (需要 with-http_stub_status_module 模块支持) 暴露出了一些 Nginx 运行时的指标，较为简单，在 Prometheus 中对应的 Metrics 也少。nginx_exporter 主要就是获取 stub_status 中内建的指标。
可以通过 nginx-vts-exporter 监控 Nginx 更多的指标，但 nginx-vts-exporter 依赖于 Nginx 编译安装是添加的第三方模块 nginx-module-vts 来实现，指标更为丰富。建议使用此种监控方式。

环境信息

Centos 7
Nginx stable 1.24.0
nginx-vts-exporter v0.10.3
nginx-module-vts v0.2.2

安装配置 nginx-vts-exporter 和 nginx-module-vts 来监控 Nginx Metrics

Nginx 编译安装 nginx-module-vts 模块

Nginx 编译安装 `nginx-module-vts` 模块

Nginx 安装了 nginx-module-vts 后，可以通过以下配置暴露运行时的指标

status.conf

vhost_traffic_status_zone;
vhost_traffic_status_filter_by_host on;


server{
    listen 8081;
    server_name localhost;
    location /status {
        vhost_traffic_status_display;
        vhost_traffic_status_display_format html;
    }
}

重启 Nginx 后，访问 http://localhost:8081/status 即可查看到 Nginx 运行时的指标

安装 nginx-vts-exporter

nginx-vts-exporter github 官网

wget https://github.com/hnlq715/nginx-vts-exporter/releases/download/v0.10.3/nginx-vts-exporter-0.10.3.linux-amd64.tar.gz
tar -xf nginx-vts-exporter-0.10.3.linux-amd64.tar.gz
cp nginx-vts-exporter-0.10.3.linux-amd64/nginx-vts-exporter /usr/bin/

创建 systemd 管理配置文件 /usr/lib/systemd/system/nginx-vts-exporter.service

/usr/lib/systemd/system/nginx-vts-exporter.service


[Unit]
Description=nginx-vts-exporter
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/bin/nginx-vts-exporter \
  -nginx.scrape_timeout 10 \
  -nginx.scrape_uri http://127.0.0.1:8081/status/format/json
Restart=on-failure
[Install]
WantedBy=multi-user.target

启动服务，默认监听端口为 9913

systemctl enable --now nginx-vts-exporter

浏览器访问 localhost:9913/metrics 即可看到 nginx-vts-exporter 暴露出来的 Metrics

之后 Prometheus 可通过 9913 端口抓取监控数据。

AlertManager 使用

发表于 2023-06-14 更新于 2023-07-03 上层目录 Tools

环境信息

AlertManager 0.24.0

部署配置 AlertManager

AlertManager 是一个专门用于实现告警的工具，可以实现接收 Prometheus 或其它应用发出的告警信息，并对这些告警信息进行分组、抑制以及静默等操作，然后通过路由的方式，根据不同的告警规则配置，分发到不同的告警路由策略中。 ^[1]

AlertManager 常用的功能主要有:

抑制 - 抑制是一种机制，指的是当某一告警信息发送后，可以停止由此告警引发的其它告警，避免相同的告警信息重复发送。
静默 - 静默也是一种机制，指的是依据设置的标签，对告警行为进行静默处理。如果 AlertManager 接收到的告警符合静默配置，则 Alertmanager 就不会发送该告警通知。
发送告警 - 支持配置多种告警规则，可以根据不同的路由配置，采用不同的告警方式发送告警通知。
告警分组 - 分组机制可以将详细的告警信息合并成一个通知。在某些情况下，如系统宕机导致大量的告警被同时触发，在这种情况下分组机制可以将这些被触发的告警信息合并为一个告警通知，从而避免一次性发送大量且属于相同问题的告警，导致无法对问题进行快速定位。

部署 AlertManager

本文部署配置基于 K8S 上安装 Prometheus 并监控 K8S 集群

在名为 prometheus-server-conf 的 ConfigMap 中为 AlertManager 创建配置文件 alertmanager.yml，并将其挂载到 AlertManager 容器中

alertmanager.yml

global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 5m
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://localhost:8080/alert_manager_webhook'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

使用为 Prometheus 创建的 PVC 作为 AlertManager 的持久存储，参考以下配置部署 AlertManager

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-pod
  namespace: prometheus
  labels:
    app: prometheus-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-server
  template:
    metadata:
      labels:
        app: prometheus-server
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus
          args:
            - "--storage.tsdb.retention.time=12h"
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus/"
          ports:
            - containerPort: 9090
          resources:
            requests:
              cpu: 500m
              memory: 500M
            limits:
              cpu: 1
              memory: 1Gi
          volumeMounts:
            - name: prometheus-config-volume
              mountPath: /etc/prometheus/
            - name: prometheus-storage-volume
              mountPath: /prometheus/
              subPath: prometheus
        - name: grafana
          image: grafana/grafana
          ports:
            - containerPort: 3000
          volumeMounts:
            - name: prometheus-storage-volume
              mountPath: /var/lib/grafana
              subPath: grafana
        - image: prom/alertmanager:v0.24.0
          name: alert-manager
          ports:
            - containerPort: 9093
          args:
            - "--config.file=/etc/alertmanager/alertmanager.yml"
            - "--web.external-url=http://alert-manager.example.com/"
            - '--cluster.advertise-address=0.0.0.0:9093'
            - "--storage.path=/alertmanager"
          resources:
            limits:
              cpu: 1000m
              memory: 512Mi
            requests:
              cpu: 1000m
              memory: 512Mi
          readinessProbe:
            httpGet:
              path: /-/ready
              port: 9093
            initialDelaySeconds: 5
            timeoutSeconds: 10
          livenessProbe:
            httpGet:
              path: /-/healthy
              port: 9093
            initialDelaySeconds: 30
            timeoutSeconds: 30
          volumeMounts:
          - name: prometheus-storage-volume
            mountPath: /alertmanager 
            subPath: alertmanager
          - name: prometheus-config-volume
            mountPath: /etc/alertmanager
      volumes:
        - name: prometheus-config-volume
          configMap:
            defaultMode: 420
            name: prometheus-server-conf
  
        - name: prometheus-storage-volume
          persistentVolumeClaim:
            claimName: prometheus-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus-service
  namespace: prometheus
spec:
  ports:
    - name: prometheus-port
      port: 8090
      protocol: TCP
      targetPort: 9090
    - name: grafana-port
      port: 3000
      targetPort: 3000
    - name: alert-manager-port
      port: 9093
      targetPort: 9093
  selector:
    app: prometheus-server

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: prometheus-ui
  namespace: prometheus

spec:
  ingressClassName: nginx
  rules:
  - host: prometheus.example.com
    http:
      paths:
      - backend:
          service:
            name: prometheus-service
            port: 
              number: 8090
        path: /
        pathType: Prefix
  - host: grafana.example.com
    http:
      paths:
      - backend:
          service:
            name: prometheus-service
            port:
              number: 3000
        path: /
        pathType: Prefix
  - host: alert-manager.example.com
    http:
      paths:
      - backend:
          service:
            name: prometheus-service
            port:
              number: 9093
        path: /
        pathType: Prefix

部署成功后，从 AlertManager 的域名访问，可以看到 AlertManager 的 web UI

阅读全文 »

Prometheus Federation 安装配置

发表于 2023-06-27 上层目录 Tools ， Prometheus

如上图所示，在每个数据中心部署单独的 Prometheus Server，用于采集当前数据中心监控数据。并由一个中心的 Prometheus Server 负责聚合多个数据中心的监控数据。这一特性在 Promthues 中称为 Federation (联邦集群）。

Prometheus Federation (联邦集群)的核心在于每一个 Prometheus Server 都包含一个用于获取当前实例中监控样本的接口 /federate。对于中心 Prometheus Server 而言，无论是从其他的 Prometheus 实例还是 Exporter 实例中获取数据实际上并没有任何差异。

以下配置示例在中心 Prometheus Server 配置其抓取其他 Prometheus Server 的指标，必须至少有一个 match 配置，以指定要抓取的目标 Prometheus Server 的 Job 名称，可以使用正则表达式匹配抓取任务

scrape_configs:
  - job_name: 'federate'
    scrape_interval: 15s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job="prometheus"}'
        - '{__name__=~"job:.*"}'
        - '{__name__=~"node.*"}'
    static_configs:
      - targets:
        - '192.168.77.11:9090'
        - '192.168.77.12:9090'

__name__ 是 Prometheus 特殊的预定义标签，表示指标的名称
使用以下配置采集目标 Prometheus Server 的所有指标
params:
  'match[]':
    - '{job=~".*"}'

cAdvisor 部署使用

发表于 2023-06-23 上层目录 Docker

cAdvisor 是 Goolgle 开发的用来监控容器运行指标的工具，使用 Go 语言开发。Kubelet 集成了 cAdvisor 来监控采集 Pod 中的容器的运行指标。 ^[1]

可以直接使用 vAdvisor 配合 Prometheus 来监控 Docker/Containerd 容器运行指标，并配合 Prometheus 及 Grafana 进行图形展示或告警

环境信息

cAdvisor version v0.47.0 (c7714a77)
Docker Engine - Community 20.10.9

在 host 上二进制安装部署 cAdvisor

下载二进制包，即可直接运行程序

wget https://github.com/google/cadvisor/releases/download/v0.47.0/cadvisor-v0.47.0-linux-amd64

chmod +x cadvisor-v0.47.0-linux-amd64

./cadvisor-v0.47.0-linux-amd64

运行之后，默认监听 8080 端口，启动后访问 UI : http://localhost:8080。Prometheus 会读取 http://localhost:8080/metrics 暴露的指标。

cAdvisor metrics

cAdvisor metrics 官方说明

在 cAdvisor 主机节点中可以使用以下命令列出收集到的指标

curl localhost:8080/metrics

监控容器是否在运行中

cAdvisor 的指标 container_last_seen 记录了最后一次检测到容器运行时的时间 (Gauge)，如果容器停止运行，这个值会停留在最后一次观察到容器运行的时间，可以通过此指标，使用以下表达式来监控容器是否在运行

container_last_seen - container_last_seen offset 1m == 0

脚注

1.cadvisor 官方 Github ↩

Go 部署使用

发表于 2023-06-23 上层目录 Linux

安装

执行以下命令安装 Go 环境

Go 安装包下载地址

wget https://go.dev/dl/go1.14.15.linux-amd64.tar.gz

tar -xf go1.14.15.linux-amd64.tar.gz -C /usr/local/

echo "export PATH=$PATH:/usr/local/go/bin" >> ~/.bash_profile

source ~/.bash_profile

执行 go 命令

$ go version
go version go1.20.5 linux/amd64

常见错误

bad ELF interpreter

安装后执行 go 报错

-bash: /usr/local/go/bin/go: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

问题原因 为下载的 go 与运行的目标系统不兼容，比如下载了 x86 的安装包，安装到了 64 位的 OS 上。

Linux Shell 脚本

发表于 2023-06-21 上层目录 Linux ， Shell 脚本

脚本示例

一分钟内访问指定 url 一百次

其中 ${COUNT} * ${SLEEP_INTERVAL} = 60 ，要改变频率，可以更改这 2 个值

#!/bin/bash

URL="url"  # 指定要访问的 URL
COUNT=200  # 指定访问次数
SLEEP_INTERVAL=0.3  # 休眠时间间隔（秒）

for ((i=1; i<=COUNT; i++))
do
    echo "Request $i: $(date)"
    curl -s -o /dev/null -w %{http_code} $URL 
    sleep $SLEEP_INTERVAL
done

Ceph 安装使用

发表于 2023-06-16 更新于 2023-06-20 上层目录存储， Ceph

环境信息

Centos7 6.3.8-1.el7.elrepo.x86_64
Python-3.10.12
Docker-ce 20.10.9
Ceph version 17.2.6

Ceph 版本发布列表

服务器环境信息说明

服务器	IP	配置	用途
`ceph-node-1`	10.111.30.100	centos 7 6.3.8-1 2c 3G 50G	`cephadm` 节点 `monitor daemon`
`ceph-node-2`	10.111.30.110	centos 7 6.3.8-1 2c 5G 50G
`ceph-node-3`	10.111.30.120	centos 7 6.3.8-1 2c 5G 50G

安装

本文档使用 cephadm 安装 Ceph Cluster，使用 cephadm 会首先在 Ceph Cluster 的第一个节点上安装第一个 monitor daemon，安装时 monitor daemon 必须指定和集群通信的 IP 地址。 ^[3]

依赖

Python 3
Systemd
Docker
Time synchronization (such as chrony or NTP)
LVM2 for provisioning storage devices

需要提前配置好集群节点服务器的主机名，并安装 Python 3、Docker。安装集群时，会自动安装 chrony 用来做时间同步

配置节点防火墙，允许节点之间网络互通

安装 cephadm

使用 curl 安装最新版本 ^[1]

CEPH_RELEASE=17.2.6
curl --silent --remote-name --location https://download.ceph.com/rpm-${CEPH_RELEASE}/el9/noarch/cephadm
chmod +x cephadm

将 cephadm 安装到主机系统，Centos 7 未提供最新版本的 repo

./cephadm add-repo --release octopus

rpm --import 'https://download.ceph.com/keys/release.asc'

./cephadm install

检查安装后的 cephadm 命令路径

$ which cephadm
/usr/sbin/cephadm

阅读全文 »

Ceph 介绍

发表于 2023-06-19 上层目录存储， Ceph

Ceph 可以提供（实现）的存储方式包括：

块存储 - 提供类似普通硬盘的存储，为客户端提供硬盘
文件系统存储 - 分布式的共享文件系统
对象存储 - 提供大小无限制的云存储空间

Ceph 是一个分布式的存储系统，非常灵活，若需要扩容，只需要向集群增加节点（服务器）即可，其存储的数据采用多副本的方式进行存储，生产环境中，至少需要存 3 份副本。

Ceph 构成组件

Monitor Daemon - Ceph Mon 维护 Ceph 存储集群映射的主副本和 Ceph 存储群集的当前状态。监控器需要高度一致性，确保对Ceph 存储集群状态达成一致。维护着展示集群状态的各种图表，包括监视器图、 OSD 图、归置组（ PG ）图、和 CRUSH 图。默认需要 5 个 ^[1]
Mgr - 集群管理组件。默认需要 2 个。主要负责跟踪集群的运行指标及当前状态，包括存储使用率、性能指标及系统负载等。它也负责暴露基于 python 的 Ceph Web Dashboard 和 REST API。
OSD Daemon - OSD 用于存储数据。此外，Ceph OSD 利用 Ceph 节点的 CPU、内存和网络来执行数据复制、纠删代码、重新平衡、恢复、监控和报告功能。存储节点有几块硬盘用于存储，该节点就会有几个 osd 进程。
MDSs - Metadata Server，为 Ceph 文件系统存储元数据
RGW - 对象存储网关。主要为访问 ceph 的软件提供 API 接口。

参考链接

脚注

1.INTRO TO CEPH ↩

常用工具下载链接

发表于 2021-02-04 更新于 2023-06-16 上层目录 Tools

常用工具下载目录

包含常用工具：

Xshell 7.0.0
使用参考
fiddler-linux.zip
使用参考
Fiddler Everywhere 4.2.1
cwrsync_6.2.4_x64_free
使用参考
FileZilla_3.62.2_win64
openvpn-connect-3.3.7.2979_signed
VMware-workstation-full-17.0.0-20800274
CentOS-7-x86_64-Minimal-2207-02
Python 3.10.12 编译后的安装文件(Python-3.10.12.installed.tar)，可以安装依赖后，解压直接使用
HttpCanary

常用工具链接

m3u8 视频在线播放器

架构图在线绘制工具

https://app.diagrams.net/

代码或文本比对在线工具

https://tool.oschina.net/diff/

随机密码或字符串在线生成

https://suijimimashengcheng.bmcx.com/

可以使用 shell 命令：

$ openssl rand -base64
Usage: rand [options] num
where options are
-out file             - write to file
-engine e             - use engine e, possibly a hardware device.
-rand file:file:... - seed PRNG from files
-base64               - base64 encode output
-hex                  - hex encode output

具体示例，生成 8 位随机字符串

$ openssl rand -base64 8
HEoK6ZRtD7o=

域名 whois 信息在线查询

WHOIS Lookup by ICANN ICANN 是国际域名与地址分配机构
https://whois.chinaz.com/

查询出口 ip 地址

https://whoer.net/zh
https://whatismyipaddress.com/
ip-api.com 主要用于查询 IP 地址相关信息的 API，比如地理位置、ASN、ISP 等

JSON 在线校验及格式化

https://jsoneditoronline.org/

证书工具

查看证书内容

https://myssl.com/cert_decode.html

查询证书和私钥是否匹配

https://myssl.com/match_key.html

centos 系列 rpm 包下载地址

http://rpmfind.net/linux/RPM/

Python3 安装

发表于 2023-06-16 上层目录 Python

环境信息

Centos 7 6.3.8-1.el7.elrepo.x86_64
Python 3.10.12

编译安装步骤

安装相关依赖

yum -y groupinstall "Development tools"

yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel

yum install libffi-devel -y

Centos 7 默认安装的 openssl 版本太低（1.0.2k-fips），无法满足 Python 3.10.12 对 SSL 的支持，需要首先升级 OpenSSL 版本。否则编译安装后使用 SSL 相关功能会报错： ImportError: No module named _ssl

下载安装包并编译安装。使用 --with-openssl=/usr/local/openssl/ 指定新版本 openssl 位置

wget https://www.python.org/ftp/python/3.10.12/Python-3.10.12.tgz

tar -xf Python-3.10.12.tgz

cd Python-3.10.12

./configure --prefix=/usr/local/python3 --with-openssl=/usr/local/openssl/

make && make install

ln -s /usr/local/python3/bin/python3 /usr/bin/
ln -s /usr/local/python3/bin/pip3 /usr/bin/

安装完成后，安装目录为 /usr/local/python3，要迁移到其他机器使用，可以安装相关依赖后，将安装目录拷贝到其他机器即可正常使用。此版本编译后的文件下载链接

Prometheus 和 Grafana 使用示例

发表于 2023-06-06 更新于 2023-06-13 上层目录 Tools

环境信息

Prometheus 2.44.0
Grafana 9.5.2
Kubernetes 1.24

Kubernetes

CPU

Grafana 中获取所有 node 的 CPU 使用率

以下示例中，$node 为在 Grafana 的 Dashboard 中配置的 Variables，其值为 Kubernetes 的节点主机名。$interval 为在 Grafana 的 Dashboard 中配置的 Variables，其值表示 Prometheus 的查询时间范围变量。

100 - avg(rate(node_cpu_seconds_total{mode="idle",kubernetes_io_hostname=~"$node"}[$interval])) * 100

Memory

Grafana 中获取所有 node 的 Memory 使用率

100 - (sum(node_memory_MemFree_bytes{kubernetes_io_hostname=~"$node"}) + sum(node_memory_Cached_bytes{kubernetes_io_hostname=~"$node"}) + sum(node_memory_Buffers_bytes{kubernetes_io_hostname=~"$node"})) / sum(node_memory_MemTotal_bytes{kubernetes_io_hostname=~"$node"}) * 100

Disk

Grafana 中获取所有 node 的 Disk 使用率

100 - (sum(node_filesystem_avail_bytes{kubernetes_io_hostname=~"$node"}) / sum(node_filesystem_size_bytes{kubernetes_io_hostname=~"$node"})) * 100

Network

统计节点物理网卡的流入流出流量

irate(node_network_receive_bytes_total{device!~"cni0|docker.*|flannel.*|veth.*|virbr.*|lo",kubernetes_io_hostname=~"$node"}[$prometheusTimeInterval])

irate(node_network_transmit_bytes_total{device!~"cni0|docker.*|flannel.*|veth.*|virbr.*|lo",kubernetes_io_hostname=~"$node"}[$prometheusTimeInterval])

Pod

计算集群中可以使用的 Pod 的数量

按照 namespace 及 Pod 统计 Pod CPU 使用率

sum(rate(container_cpu_usage_seconds_total[1m])) by (namespace, pod)

统计 Pod 使用的内存

container_memory_usage_bytes

统计 Pods 的流量

irate(container_network_receive_bytes_total{namespace=~"$k8sNamespace",interface="eth0",kubernetes_io_hostname=~"$node"}[$prometheusTimeInterval])

统计 Pods 的重启次数

sum(kube_pod_container_status_restarts_total{namespace=~"$k8sNamespace"}) by (namespace,container)

inotify 安装使用

发表于 2019-07-18 更新于 2023-06-09 上层目录 Linux ，常用服务

环境信息

Centos 7

安装

使用系统软件包管理器安装
yum install -y inotify-tools
安装后包含2个命令： inotifywait , inotifywatch,较为常用的命令是 inotifywait

编译安装
此处安装版本 3.22.6.0 ^[1]

wget https://github.com/inotify-tools/inotify-tools/archive/refs/tags/3.22.6.0.tar.gz
tar -xf 3.22.6.0.tar.gz
cd inotify-tools-3.22.6.0/
yum install -y dh-autoreconf
./autogen.sh && ./configure --prefix=/usr/local/inotify-tools-3.22.6.0  && make && su -c 'make install'

阅读全文 »

fswatch 安装使用

发表于 2023-06-08 更新于 2023-06-09 上层目录 Linux ，常用服务

环境信息

Centos 7
fswatch-1.17.1

安装

Centos 7 默认安装的 gcc 版本太低，无法满足 fswatch-1.17.1 的编译配置要求，需要首先升级 gcc 版本，本示例中 gcc-8.3.0 安装位置为 /usr/local/gcc-8.3.0/，如果系统安装的 gcc 版本符合要求，无需在 ./configure 时指定 gcc 环境变量 CXX=/usr/local/gcc-8.3.0/bin/g++。

wget https://github.com/emcrisostomo/fswatch/archive/refs/tags/1.17.1.tar.gz
tar -xf 1.17.1.tar.gz
cd fswatch-1.17.1/
sh autogen.sh

./configure CXX=/usr/local/gcc-8.3.0/bin/g++ --prefix=/usr/local/fswatch-1.17.1

make
make install

使用

常用选项。参考文档安装 man 手册后可以查看详细的帮助文档

fswatch 会为监控到的每条事件记录以下信息

timestamp - 事件发生的时间戳
path - 触发事件的文件（夹）路径
event types - 空格分割的事件类型

选项	说明	示例
`-0, --print0`	ASCII NUL character (`\0`) as line separator Since file names can potentially contain any character but `NUL`, this option assures that the output of fswatch can be safely parsed using `NUL` as delimiter, such as using `xargs -0` and the shell builtin `read -d ''`.	使用示例
`-1, --one-event`	Exit fswatch after the first set of events is received
`--event name`	can be used multiple times
`-e, --exclude regexp`	Exclude paths matching regexp, Multiple exclude filters can be specified using this option multiple times
`-i, --include regexp`	Include paths matching regexp
`-f, --format-time format`	Print the event time using the specified format
`-I, --insensitive`	Use case insensitive regular expressions
`-m, --monitor name`	Uses the monitor specified by name 可用的 monitor： - `inotify_monitor` - `poll_monitor`
`-r, --recursive`	Watch subdirectories recursively
`-t, --timestamp`	Print the event timestamp.
`-u, --utf-time`	Print the event time in UTC format. When this option is not specified, the time is printed using the system local time, as defined by localtime
`-l, --latency latency`	监听间隔，默认1s

常用事件

事件	说明	示例
`NoOp`	Idle event, optionally issued when no changes were detected
`Created`	The object has been created.
`Updated`	The object has been updated. The kind of update is monitor-dependent.
`Removed`	The object has been removed.
`Renamed`	The object has been renamed.
`OwnerModified`	The object’s owner has changed.
`AttributeModified`	An object’s attribute has changed.
`MovedFrom`	The object has moved from this location to a new location of the same file system.
`MovedTo`	The object has moved from another location in the same file system into this location.
`IsFile`	The object is a regular file.
`IsDir`	The object is a directory.
`IsSymLink`	The object is a symbolic link.
`Link`	The object link count has changed.
`Overflow`	The monitor has overflowed.

使用示例

输出事件的行分隔符

以下示例中监视文件 nohup.out，输出事件的每一行使用 \0 分割，read 读取时也使用 \0 ("") 分割，可以防止文件名中包含了空格，使用 read 时读取文件名不全。

$ fswatch -0 nohup.out | while read -d "" file; do echo ${file}; done
/root/nohup.out
/root/nohup.out
/root/nohup.out

使用 xargs 处理监听事件

$ fswatch -0 [opts] [paths] | xargs -0 -n 1 -I {} [command]

fswatch -0 will split records using the NUL character.
xargs -0 will split records using the NUL character. This is required to correctly match impedance with fswatch.
xargs -n 1 will invoke command every record. If you want to do it every x records, then use xargs -n x.
xargs -I {} will substitute occurrences of {} in command with the parsed argument. If the command you are running
does not need the event path name, just delete this option. If you prefer using another replacement string, substi‐
tute {} with yours.

以下示例监视文件变化后进行备份

fswatch -0 nohup.out | xargs -0 -I {} cp {} {}.`date +%Y%m%d%H%M%S`

以上命令中 date +%Y%m%d%H%M%S 只会被计算一次，假如第一次执行时 date +%Y%m%d%H%M%S = nohup.out.20230609132143，那么之后每次触发 xargs，变量 date +%Y%m%d%H%M%S 的值都是 nohup.out.20230609132143，不会被重新计算

常见问题

Event queue overflow

执行以下命令，过一段时间后会输出 Event queue overflow

$ fswatch -0 nohup.out | xargs -0 -I {} cp {} {}.`date +%Y%m%d%H%M%S`
Event queue overflow.
Status code: 1

解决方法 可以选择以下之一。

使用 poll_monitor monitor 而不是默认的 inotify_monitor

fswatch -0 --monitor=poll_monitor nohup.out | xargs -0 -I {} cp {} {}.`date +%Y%m%d%H%M%S`

此限制是因为内核参数限制，主要参数 fs.inotify.max_queued_events ^[1]

查看内核参数 fs.inotify.max_queued_events 的值，默认值为 16384
$ sysctl fs.inotify.max_queued_events
fs.inotify.max_queued_events = 16384
修改默认值后，重新测试，结果正常
$ sysctl fs.inotify.max_queued_events=1000000
fs.inotify.max_queued_events = 1000000

$ sysctl fs.inotify.max_queued_events
fs.inotify.max_queued_events = 1000000
永久修改此参数的值，可以将其写入内核配置文件 /etc/sysctl.conf

脚注

1.OVERFLOW in event queue - Solution is to tune fs.inotify.max_queued_events ↩

lftp 使用

发表于 2023-06-08 上层目录 Linux ，常用命令

环境信息

Centos 7

lftp 安装

yum install -y lftp

常见用法

查看帮助信息

$ lftp -h
Usage: lftp [OPTS] <site>
`lftp' is the first command executed by lftp after rc files
 -f <file>           execute commands from the file and exit
 -c <cmd>            execute the commands and exit
 --help              print this help and exit
 --version           print lftp version and exit
Other options are the same as in `open' command
 -e <cmd>            execute the command just after selecting
 -u <user>[,<pass>]  use the user/password for authentication
 -p <port>           use the port for connection
 <site>              host name, URL or bookmark name

登陆 FTP

命令格式

lftp ${USER}:${PASSWORD}@${FTPIP}:${FTPPORT}

上传文件

在一条 shell 命令中执行登陆、上传、退出操作

lftp -u ${USER},${PASSWORD} -p ${FTPPORT} ${FTPIP} -e "put /1.mp4 && exit"

腾讯云相关配置

发表于 2023-05-26 上层目录云平台，腾讯云

Python SDK

Python SDK 官网使用说明

安装

pip install tencentcloud-sdk-python-intl-en

预热功能

参考文档

import json
from tencentcloud.common import credential
from tencentcloud.common.profile.client_profile import ClientProfile
from tencentcloud.common.profile.http_profile import HttpProfile
from tencentcloud.common.exception.tencent_cloud_sdk_exception import TencentCloudSDKException
from tencentcloud.vod.v20180717 import vod_client, models
try:
    # 实例化一个认证对象，入参需要传入腾讯云账户 SecretId 和 SecretKey，此处还需注意密钥对的保密
    # 代码泄露可能会导致 SecretId 和 SecretKey 泄露，并威胁账号下所有资源的安全性。密钥可前往官网控制台 https://console.tencentcloud.com/capi 进行获取
    cred = credential.Credential("SecretId", "SecretKey")
    # 实例化一个http选项，可选的，没有特殊需求可以跳过
    httpProfile = HttpProfile()
    httpProfile.endpoint = "vod.tencentcloudapi.com"

    # 实例化一个client选项，可选的，没有特殊需求可以跳过
    clientProfile = ClientProfile()
    clientProfile.httpProfile = httpProfile
    # 实例化要请求产品的client对象,clientProfile是可选的
    client = vod_client.VodClient(cred, "", clientProfile)

    # 实例化一个请求对象,每个接口都会对应一个request对象
    req = models.PushUrlCacheRequest()
    params = {
        "Urls": [ "https://test.domain.com/z44R8F4D.ts", "https://test.domain.com/z70TBUet.ts", "https://test.domain.com/zB2OEC1t.ts", 
                  "https://test.domain.com/zZw91TCL.ts", "https://test.domain.com/zbJ9U6Su.ts", "https://test.domain.com/zbvqkOMN.ts"]
    }
    req.from_json_string(json.dumps(params))

    # 返回的resp是一个PushUrlCacheResponse的实例，与请求对象对应
    resp = client.PushUrlCache(req)
    # 输出json格式的字符串回包
    print(resp.to_json_string())

except TencentCloudSDKException as err:
    print(err)

脚注

1.Python SDK 官网使用说明 ↩