Jekyll2020-07-11T11:41:59+08:00http://0.0.0.0/Jevic......Jevichelm v3.x 部署 redis2020-07-01T20:35:46+08:002020-07-01T20:35:46+08:00http://0.0.0.0/2020/07/01/helm-v3<blockquote>
<p>helm v3.0 进行了重构,区别与v2.x 不需要安装tiller;</p>
</blockquote>
<h2 id="installing-helm"><a href="https://helm.sh/docs/intro/install/">Installing Helm</a></h2>
<p><a href="https://github.com/helm/helm/releases">Download release</a></p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[root@kms ~]# helm version
version.BuildInfo{Version:"v3.2.4", GitCommit:"0ad800ef43d3b826f31a5ad8dfbb4fe05d143688", GitTreeState:"clean", GoVersion:"go1.13.12"}
</code></pre></div></div>
<h2 id="添加国内源">添加国内源</h2>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm repo remove stable
helm repo add gitlab https://charts.gitlab.io/
helm repo add stable https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
helm repo add incubator https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts-incubator/
helm repo add aliyun https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts-incubator/
helm repo update
helm repo list
NAME URL
stable https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
gitlab https://charts.gitlab.io/
incubator https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts-incubator/
aliyun https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts-incubator/
</code></pre></div></div>
<h2 id="安装redis">安装redis</h2>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[root@kms ~]# helm list
Error: Kubernetes cluster unreachable
</code></pre></div></div>
<p>解决方式: 配置 ~/.kube/config</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># helm search repo redis
NAME CHART VERSION APP VERSION DESCRIPTION
stable/redis 1.1.15 4.0.8 Open source, advanced key-value store. It is of...
stable/redis-ha 2.0.1 Highly available Redis cluster with multiple se...
stable/sensu 0.2.0 Sensu monitoring framework backed by the Redis ...
## 出现报错
[root@kms ~]# helm install redis stable/redis
Error: unable to build kubernetes objects from release manifest: unable to recognize "": no matches for kind "Deployment" in version "extensions/v1beta1"
</code></pre></div></div>
<h2 id="解决方式">解决方式:</h2>
<blockquote>
<p>将 kind “Deployment” version 从”extensions/v1beta1” 修改为 apps/v1</p>
</blockquote>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[root@kms helm]# helm pull stable/redis
[root@kms helm]# pwd
/root/helm
[root@kms helm]# ls
redis-1.1.15.tgz
[root@kms helm]# tar zxf redis-1.1.15.tgz
[root@kms helm]# ls
redis redis-1.1.15.tgz
[root@kms templates]# pwd
/root/helm/redis/templates
[root@kms templates]# ls
deployment.yaml _helpers.tpl networkpolicy.yaml NOTES.txt pvc.yaml secrets.yaml svc.yaml
[root@kms templates]# vim deployment.yaml
[root@kms templates]# head -n2 deployment.yaml
apiVersion: apps/v1
kind: Deployment
</code></pre></div></div>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># helm install redis redis
NAME: redis
LAST DEPLOYED: Wed Jul 8 14:00:29 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Redis can be accessed via port 6379 on the following DNS name from within your cluster:
redis-redis.default.svc.cluster.local
To get your password run:
REDIS_PASSWORD=$(kubectl get secret --namespace default redis-redis -o jsonpath="{.data.redis-password}" | base64 --decode)
To connect to your Redis server:
1. Run a Redis pod that you can use as a client:
kubectl run --namespace default redis-redis-client --rm --tty -i \
--env REDIS_PASSWORD=$REDIS_PASSWORD \
--image bitnami/redis:4.0.8-r2 -- bash
2. Connect using the Redis CLI:
redis-cli -h redis-redis -a $REDIS_PASSWORD
</code></pre></div></div>Jevichelm v3.0 进行了重构,区别与v2.x 不需要安装tiller;Apache SkyWalking Alarm2020-06-09T18:10:46+08:002020-06-09T18:10:46+08:00http://0.0.0.0/2020/06/09/skywalking-alarm<h2 id="alarm-settingsyml">alarm-settings.yml</h2>
<blockquote>
<p><code class="highlighter-rouge">alarm-settings.yml</code> 配置为configmap 以便于动态修改加载;</p>
</blockquote>
<p>详细配置参考: <a href="https://github.com/jevic/skywalking/blob/master/deployment/configmap.yml">deployment/configmap.yml</a></p>
<h2 id="alarm-body">alarm body</h2>
<p>skywalking alarm报警信息如下所示;</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[{
"scopeId": 1,
"scope": "SERVICE",
"name": "java-app-demo",
"id0": 23,
"id1": 0,
"ruleName": "service_sla_rule",
"alarmMessage":
"Successful rate of service java-app-demo is lower than 80% in 2 minutes of last 10 minutes",
"startTime": 1591951818298
}, {
"scopeId": 1,
"scope": "SERVICE",
"name": "java-app-demo-test",
"id0": 2,
"id1": 0,
"ruleName": "service_sla_rule",
"alarmMessage":
"Successful rate of service java-app-demo-test is lower than 80% in 2 minutes of last 10 minutes",
"startTime": 1591951818298
}]
</code></pre></div></div>
<p>因此,需要对 <code class="highlighter-rouge">body</code> 数据重新进行处理;</p>
<h2 id="webhook企业微信">webhook(企业微信)</h2>
<h3 id="weixinpy">weixin.py</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># -*- coding:utf-8 -*-
import requests
import json
url = "https://qyapi.weixin.qq.com/cgi-bin/webhook/send"
def Weixin(content):
querystring = {"key": "6c48b0fd-38c0-xxxx-a314-b76ed88f4fba"}
payload = {"msgtype": "text", "text": {"content": "%s" % content}}
headers = {
'Content-Type': "application/json",
}
response = requests.request("POST",
url,
data=json.dumps(payload),
headers=headers,
params=querystring)
print(response.text)
</code></pre></div></div>
<h3 id="alarmpy">alarm.py</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># -*- coding:utf-8 -*-
from flask import Flask, request, jsonify
from weixin import Weixin
import json
import time
app = Flask(__name__)
@app.route('/alarm', methods=['POST', 'GET'])
def wechat():
if request.method == 'POST':
try:
data = request.get_data()
codes = json.loads(data.decode('utf-8'))
for i in range(len(codes)):
Message = codes[i]
name = Message['name']
scope = Message['scope']
ruleName = Message['ruleName']
message = Message['alarmMessage']
startTime = str(Message['startTime'])
timeStamp = int(startTime[0:10])
timeArray = time.localtime(timeStamp)
StartTime = time.strftime("%Y-%m-%d %H:%M:%S", timeArray)
AlarmMsg = "服务名称: %s\n范围: %s\n规则: %s\n开始时间: %s\n告警内容: %s" % (name, scope, ruleName, StartTime, message)
Weixin(AlarmMsg)
return jsonify({"msg": "ok"})
except Exception as error:
print (error)
return jsonify({"mgs": error})
else:
return jsonify({"mgs": "POST only!!"})
if __name__ == "__main__":
app.run(host='0.0.0.0', port=5100)
</code></pre></div></div>
<p><img src="https://raw.githubusercontent.com/jevic/images/master/kubernetes/skywalking-alarm.png" alt="" /></p>
<h3 id="alarm-images">alarm images</h3>
<blockquote>
<p>默认端口:5100</p>
</blockquote>
<blockquote>
<p>环境变量: token=xxxx</p>
</blockquote>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>registry.cn-shenzhen.aliyuncs.com/jevic/skywalking:alarm-dingding
registry.cn-shenzhen.aliyuncs.com/jevic/skywalking:alarm-weixin
</code></pre></div></div>
<h4 id="docker-运行示例">docker 运行示例</h4>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run -d --name weixin -p 5100:5100 -e token=xxxxx IMAGE
</code></pre></div></div>
<h4 id="k8s-yaml">k8s yaml</h4>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>apiVersion: apps/v1
kind: Deployment
metadata:
name: apm-alarm
labels:
app: apm-alarm
spec:
selector:
matchLabels:
app: apm-alarm
replicas: 1
template:
metadata:
labels:
app: apm-alarm
spec:
containers:
- name: apm-alarm
image: registry.cn-shenzhen.aliyuncs.com/jevic/skywalking:alarm-weixin
env:
- name: token
value: "xx-xx-4ac3xxa314-xxx"
ports:
- containerPort: 5100
name: apm-alarm
restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
name: apm-alarm
spec:
selector:
app: apm-alarm
type: NodePort
ports:
- name: apm-alarm
port: 5100
targetPort: 5100
protocol: TCP
# nodePort:
</code></pre></div></div>
<hr />
<p><a href="https://github.com/apache/skywalking/blob/master/docs/en/setup/backend/backend-alarm.md">backend-alarm</a>
<a href="https://skyapm.github.io/document-cn-translation-of-skywalking/zh/6.3.0/setup/backend/backend-alarm.html">skywalking 告警</a></p>Jevicalarm-settings.yml alarm-settings.yml 配置为configmap 以便于动态修改加载;Apache SkyWalking 配置项说明2020-06-07T18:10:46+08:002020-06-07T18:10:46+08:00http://0.0.0.0/2020/06/07/skywalking-config<h2 id="skywalking-oap-server">skywalking-oap-server</h2>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## 主分配数量
- name: SW_STORAGE_ES_INDEX_SHARDS_NUMBER
value: "2"
## 副本数量
- name: SW_STORAGE_ES_INDEX_REPLICAS_NUMBER
value: "0"
## 每2000个请求执行一次批量处理
- name: SW_STORAGE_ES_BULK_ACTIONS
value: "2000"
## 20m 进行一次flush bulk
- name: SW_STORAGE_ES_BULK_SIZE
value: "20"
## 采样率 默认为10000(全部采样)
- name: SW_TRACE_SAMPLE_RATE
value: "1000"
</code></pre></div></div>
<p><a href="https://github.com/apache/skywalking/blob/v6.6.0/docs/en/setup/backend/backend-storage.md">更多配置参考官方说明</a></p>
<h2 id="agent">agent</h2>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## 指定agent.jar 路径
-javaagent:/skywalking/agent/skywalking-agent.jar
## oap-server 地址
-Dskywalking.collector.backend_service=192.168.13.100:11800
## 一个服务(项目)的唯一标识,这个字段决定了在sw的UI上的关于service的展示名称,尽量采用英文
-Dskywalking.agent.service_name=$APP
## 每3秒采集Trace的数量,默认为负数-1,代表在保证不超过内存Buffer区的前提下,采集所有的Trace; 此值根据实际情况调整即可;
-Dskywalking.agent.sample_n_per_3_secs=1000
## 日志级别
-Dskywalking.logging.level=WARN
## 忽略特定请求后缀的trace,需要将apm-trace-ignore-plugin-7.0.0.jar 复制到 plugins 目录下
-Dskywalking.trace.ignore_path=/actuator/**,/actuator
## 开启 性能剖析,7.0+ 版本
-Dskywalking.profile.active=true
</code></pre></div></div>Jevicskywalking-oap-serverApache SkyWalking2020-06-06T18:10:46+08:002020-06-06T18:10:46+08:00http://0.0.0.0/2020/06/06/skywalking-deploy<p>SkyWalking是观察性分析平台和应用性能管理系统。
提供分布式追踪、服务网格遥测分析、度量聚合和可视化一体化解决方案。</p>
<h2 id="apache-skywalking"><a href="https://github.com/apache/skywalking">Apache SkyWalking</a></h2>
<p>SkyWalking 的核心是数据分析和度量结果的存储平台,通过 HTTP 或 gRPC 方式向 SkyWalking Collecter 提交分析和度量数据,SkyWalking Collecter 对数据进行分析和聚合,存储到 Elasticsearch、MySQL、TiDB 等其一即可,最后我们可以通过 SkyWalking UI 的可视化界面对最终的结果进行查看。</p>
<p>Skywalking 支持从多个来源和多种格式收集数据:多种语言的 Skywalking Agent 、Zipkin v1/v2 、Istio 勘测、Envoy 度量等数据格式。</p>
<p><img src="http://skywalking.apache.org/assets/frame.jpeg" alt="" />
<img src="http://skywalking.apache.org/assets/frame-v8.jpg" alt="" /></p>
<h2 id="目前最主流的两个apm对比"><a href="https://www.jianshu.com/p/626cae6c0522">目前最主流的两个APM对比</a></h2>
<p><img src="https://raw.githubusercontent.com/jevic/images/master/kubernetes/skywalking-pk-pinpoint.png" alt="" /></p>
<h2 id="dockerfile">Dockerfile</h2>
<h3 id="oap-server--ui">OAP server & UI</h3>
<ul>
<li><a href="https://github.com/apache/skywalking-docker">官方镜像 Dockerfile</a></li>
</ul>
<h4 id="oap-server">OAP server</h4>
<p>为了可以动态的修改 <code class="highlighter-rouge">alarm-settings.yml</code>;
因此需要配置一个 configmap 以便于修改并重新加载alarm配置;
但是由于官方提供的默认镜像使用了<code class="highlighter-rouge">ENTRYPOINT</code> 方式来启动;无法对命令进行修改;</p>
<blockquote>
<p>因此需要做如下调整:</p>
</blockquote>
<ul>
<li>setp 1: 下载源文件</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/apache/skywalking-docker.git
</code></pre></div></div>
<ul>
<li>setp 2: 修改对应版本的Dockerfile
<ul>
<li>例如: 7.0 版本</li>
</ul>
</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># vim $PATH/skywalking-docker/7/7.0/oap-es7/Dockerfile
### 删除最后一行 ENTRYPOINT 或者将 ENTRYPOINT 修改为 CMD
ENTRYPOINT ["bash", "docker-entrypoint.sh"]
</code></pre></div></div>
<ul>
<li>setp 3: 重新build 镜像即可</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker build -t apache/skywalking-oap-server:7.0.0-es7 .
</code></pre></div></div>
<ul>
<li>setp 4: 修改时区,并上传到私有仓库</li>
</ul>
<blockquote>
<p>Dockerfile</p>
</blockquote>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>FROM apache/skywalking-oap-server:7.0.0-es7
ENV TZ=Asia/Shanghai
RUN apk add --no-cache tzdata \
&& ln -snf /usr/share/zoneinfo/$TZ /etc/localtime \
&& echo $TZ > /etc/timezone
</code></pre></div></div>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker build -t reg.jevic.cn/basic/skywalking-oap-server:7.0.0-es7 .
docker push reg.jevic.cn/basic/skywalking-oap-server:7.0.0-es7
</code></pre></div></div>
<h4 id="skywalking-ui">skywalking-ui</h4>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>FROM apache/skywalking-ui:7.0.0
ENV TZ=Asia/Shanghai
RUN apk add --no-cache tzdata \
&& ln -snf /usr/share/zoneinfo/$TZ /etc/localtime \
&& echo $TZ > /etc/timezone
</code></pre></div></div>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker build -t reg.jevic.cn/basic/skywalking-ui:7.0.0 .
docker push reg.jevic.cn/basic/skywalking-ui:7.0.0
</code></pre></div></div>
<h3 id="skywalking-agent">skywalking-agent</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>FROM apache/skywalking-base:7.0.0-es7 AS build
FROM busybox
ENV AGENT_PATH=/opt/skywalking/agent
COPY --from=build /skywalking/agent $AGENT_PATH
RUN set -ex \
## URL跟踪过滤
&& mv $AGENT_PATH/optional-plugins/apm-trace-ignore-plugin-7.0.0.jar $AGENT_PATH/plugins/ \
## 网关
&& mv $AGENT_PATH/optional-plugins/apm-spring-cloud-gateway-2.x-plugin-7.0.0.jar $AGENT_PATH/plugins/
</code></pre></div></div>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker build -t reg.jevic.cn/basic/skywalking-agent:v7 .
docker push reg.jevic.cn/basic/skywalking-agent:v7
</code></pre></div></div>
<h2 id="deployment">Deployment</h2>
<blockquote>
<p>git clone yaml 文件执行部署即可</p>
</blockquote>
<p><a href="https://github.com/jevic/skywalking/tree/master/deployment">SkyWalking K8s deployment scripts</a></p>JevicSkyWalking是观察性分析平台和应用性能管理系统。 提供分布式追踪、服务网格遥测分析、度量聚合和可视化一体化解决方案。kubernetes-v1.18.0 Addons quickstart2020-06-03T20:35:46+08:002020-06-03T20:35:46+08:00http://0.0.0.0/2020/06/03/kubernetes-addon<blockquote>
<p>kubernetes 安装此处省略,具体详情参考之前的文档以及对应版本脚本:</p>
</blockquote>
<blockquote>
<p><a href="https://www.jevic.cn/2019/08/19/kubernetes-1.13.8/">kubernetes 1.13.8 二进制手动部署</a></p>
</blockquote>
<blockquote>
<p><a href="https://github.com/jevic/kubernetes">kubernetes 二进制手动安装脚本</a> :</p>
</blockquote>
<blockquote>
<p>分支及版本信息:</p>
</blockquote>
<ul>
<li>v1.18.0 (master)</li>
<li>v1.13.8</li>
<li>v1.14.0 (只包含启动配置文件)</li>
</ul>
<h2 id="role添加">Role添加</h2>
<ul>
<li>首次安装完成后,role没有被打标签</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># kubectl get node
NAME STATUS ROLES AGE VERSION
k1 Ready <none> 1h v1.18.0
k2 Ready <none> 1h v1.18.0
k3 Ready <none> 1h v1.18.0
</code></pre></div></div>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl label nodes k1 node-role.kubernetes.io/master=
kubectl get node --show-labels
kubectl label nodes k2 node-role.kubernetes.io/node=
# 设置 master 一般情况下不接受负载
kubectl taint nodes k1 node-role.kubernetes.io/master=true:NoSchedule
master运行pod
kubectl taint nodes k1 node-role.kubernetes.io/master=
master不运行pod
kubectl taint nodes k1 node-role.kubernetes.io/master=:NoSchedule
</code></pre></div></div>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># kubectl get node
NAME STATUS ROLES AGE VERSION
k1 Ready master 1h v1.18.0
k2 Ready node 1h v1.18.0
k3 Ready node 1h v1.18.0
</code></pre></div></div>
<h2 id="calico">calico</h2>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://docs.projectcalico.org/getting-started/kubernetes/quickstart
https://docs.projectcalico.org/getting-started/kubernetes/self-managed-onprem/onpremises
</code></pre></div></div>
<h2 id="coredns">coredns</h2>
<ul>
<li><a href="https://github.com/coredns/deployment/tree/master/kubernetes">github</a></li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl -O https://raw.githubusercontent.com/coredns/deployment/master/kubernetes/coredns.yaml.sed
curl -O https://raw.githubusercontent.com/coredns/deployment/master/kubernetes/deploy.sh
./deploy.sh -i 10.254.0.2 -d cluster.local. > coredns.yaml
sed -i 's#coredns/coredns:1.6.7#registry.aliyuncs.com/google_containers/coredns:1.6.7#g' coredns.yaml
</code></pre></div></div>
<h3 id="dns-hpa">dns hpa</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns-horizontal-autoscaler
</code></pre></div></div>
<h2 id="metrics-server">metrics-server</h2>
<p>部署的YAML 文件获取方式:</p>
<ul>
<li><a href="https://github.com/kubernetes-sigs/metrics-server">gitlab 官网</a></li>
<li>k8s源码包deploy目录: cluster/addons/metrics-server</li>
<li>修改image 为阿里云镜像,然后执行apply 部署即可;</li>
</ul>
<h3 id="报错处理">报错处理</h3>
<ul>
<li>metrics-server 401 Unauthorized</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>### 临时解决创建匿名用户-不太建议此操作;
kubectl create clusterrolebinding the-boss --user system:anonymous --clusterrole cluster-admin
</code></pre></div></div>
<h2 id="nginx-ingress">nginx-ingress</h2>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://kubernetes.github.io/ingress-nginx/deploy/#bare-metal
wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/baremetal/deploy.yaml
</code></pre></div></div>
<p>修改 <a href="https://kubernetes.github.io/ingress-nginx/deploy/baremetal/#via-the-host-network">网络模式-host</a></p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>template:
spec:
hostNetwork: true
</code></pre></div></div>
<h2 id="dashboard-v20">dashboard v2.0</h2>
<p><a href="https://github.com/kubernetes/dashboard">dashboard</a></p>
<h3 id="apiserver-配置">apiserver 配置</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># grep basic /etc/kubernetes/apiserver
--basic-auth-file=/etc/kubernetes/basic-auth.csv \
</code></pre></div></div>
<h3 id="用户密码文件">用户密码文件</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># cat /etc/kubernetes/basic-auth.csv
admin,admin,1
password123,test,2
</code></pre></div></div>
<ul>
<li>说明: password,user,userID</li>
</ul>
<h3 id="dashboard-yaml">dashboard yaml</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.....
template:
metadata:
labels:
k8s-app: kubernetes-dashboard
spec:
containers:
- name: kubernetes-dashboard
image: reg.yl.com/jk8s/kubernetesui/dashboard:v2.0.1
imagePullPolicy: Always
ports:
- containerPort: 8443
protocol: TCP
args:
- --auto-generate-certificates
- --namespace=kubernetes-dashboard
- --authentication-mode=basic
....
</code></pre></div></div>
<h3 id="授权用户">授权用户</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl create clusterrolebinding login-on-dashboard-admin --clusterrole=cluster-admin --user=admin
kubectl get clusterrolebinding login-on-dashboard-admin
</code></pre></div></div>
<h3 id="ingress-域名配置">ingress 域名配置</h3>
<h5 id="添加证书">添加证书</h5>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># kubectl create secret tls dashboard-secret-jk8s --namespace=kubernetes-dashboard --cert jevic.com.pem --key jevic.com.key
</code></pre></div></div>
<h5 id="nginx-配置">nginx 配置</h5>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># cat ui-ing.yml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: kubernetes-dashboard
namespace: kubernetes-dashboard
annotations:
ingress.kubernetes.io/ssl-passthrough: "true"
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
tls:
- hosts:
- jk8s.jevic.com
secretName: dashboard-secret-jk8s
rules:
- host: jk8s.jevic.com
http:
paths:
- path: /
backend:
serviceName: kubernetes-dashboard
servicePort: 443
</code></pre></div></div>
<h2 id="lxcfs-不建议生产使用">lxcfs (不建议生产使用)</h2>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://yq.aliyun.com/articles/566208
https://github.com/lxc/lxcfs
yum install -y fuse fuse-lib fuse-devel libtool
./bootstrap.sh && ./configure && make && make install
lxcfs /var/lib/lxcfs &>/dev/null &
git clone https://github.com/denverdino/lxcfs-admission-webhook
</code></pre></div></div>
<h3 id="说明">说明</h3>
<ul>
<li>
<ol>
<li>lxcfs 服务建议直接make 编译安装在宿主机,而不是通过daemonset 方式以容器方式运行;</li>
</ol>
</li>
<li>
<ol>
<li>当lxcfs 服务异常重启后, 原有的Pod 执行exec 命令登录后,无法使用free/top 等系统命令;并且通过prometheus 等监控工具也无法获取到Pod信息;但是并不影响Pod 服务;</li>
</ol>
</li>
<li>
<ol>
<li>宿主机直接运行lxcfs服务后,当宿主机异常重启后,需要先清理掉 /var/lib/lxcfs 目录下的所有数据;否则服务无法正常启动且会导致容器也无法正常运行; 但是这样又会出现另外一个问题,当把目录数据删除后,之前运行的Pod的挂载信息就将全部丢失;同样也会出现上述第2点 所描述的异常情况;</li>
</ol>
</li>
</ul>
<h2 id="kube-debug可选">kube-debug(可选)</h2>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://github.com/aylei/kubectl-debug
添加配置:
https://github.com/aylei/kubectl-debug#configuration
</code></pre></div></div>Jevickubernetes 安装此处省略,具体详情参考之前的文档以及对应版本脚本:A Namespace Is Stuck in the Terminating State2020-05-23T20:35:46+08:002020-05-23T20:35:46+08:00http://0.0.0.0/2020/05/23/k8s-terminating<ul>
<li>
<ol>
<li>Run the following command to view the namespaces that are stuck in the Terminating state:</li>
</ol>
</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># kubectl get ns|grep Terminating
cattle-prometheus Terminating 406d
</code></pre></div></div>
<ul>
<li>
<ol>
<li>Select a terminating namespace and view the contents of the namespace to find out the finalizer. Run the following command:</li>
</ol>
</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># kubectl get ns cattle-prometheus -o yaml
apiVersion: v1
kind: Namespace
metadata:
annotations:
cattle.io/appIds: cluster-monitoring,monitoring-operator
cattle.io/status: '{"Conditions":[{"Type":"ResourceQuotaInit","Status":"True","Message":"","LastUpdateTime":"2019-05-09T06:34:14Z"},{"Type":"InitialRolesPopulated","Status":"True","Message":"","LastUpdateTime":"2019-05-09T06:34:19Z"}]}'
field.cattle.io/projectId: c-s2zlj:p-kw9lj
lifecycle.cattle.io/create.namespace-auth: "true"
creationTimestamp: "2019-05-09T06:34:13Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2019-11-13T02:50:20Z"
finalizers:
- controller.cattle.io/namespace-auth
labels:
cattle.io/creator: norman
field.cattle.io/projectId: p-kw9lj
name: cattle-prometheus
resourceVersion: "39020658"
selfLink: /api/v1/namespaces/cattle-prometheus
uid: 75c9cd36-7224-11e9-9de0-e23029de4489
spec: {}
status:
phase: Terminating
</code></pre></div></div>
<ul>
<li>
<ol>
<li>Run the following command remove <code class="highlighter-rouge">finalizers</code> config
Delete the following two lines of configuration save exit</li>
</ol>
</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># kubectl edit ns cattle-prometheus
....
finalizers:
- controller.cattle.io/namespace-auth
....
</code></pre></div></div>
<ul>
<li>
<ol>
<li>Finally, execute the delete command again</li>
</ol>
</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl delete ns cattle-prometheus
</code></pre></div></div>
<h2 id="cause-analysis">Cause analysis</h2>
<p>Finalizers is an asynchronous pre delete hook commonly used by operators of kubernetes. When this resource is deleted, it needs to be cleaned up by the program that created the resource before deletion. After cleaning up, it needs to remove the identifier from the finalizers of the resource, and then the resource will be completely deleted</p>JevicRun the following command to view the namespaces that are stuck in the Terminating state:Can’t kill YARN apps using ResourceManager UI2020-04-17T21:30:06+08:002020-04-17T21:30:06+08:00http://0.0.0.0/2020/04/17/yar-app-kill<ul>
<li>调整添加以下参数:</li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>调整core-site.xml:
hadoop.http.filter.initializers=org.apache.hadoop.security.HttpCrossOriginFilterInitializer,org.apache.hadoop.http.lib.StaticUserWebFilter
新增yare-site.xml:
yarn.resourcemanager.webapp.ui-actions.enabled=true
</code></pre></div></div>
<ul>
<li><a href="https://community.cloudera.com/t5/Support-Questions/Can-t-kill-YARN-apps-using-ResourceManager-UI-after-HDP-3-1/td-p/243835">参考文档</a></li>
</ul>Jevic调整添加以下参数:Apache Tez-ui2020-04-17T18:56:06+08:002020-04-17T18:56:06+08:00http://0.0.0.0/2020/04/17/tez-ui<ul>
<li>版本信息:
<ul>
<li>Ambari 2.7.5</li>
<li>HDP-3.1.4.0</li>
</ul>
</li>
</ul>
<h2 id="tez-ui-war包">tez-ui war包</h2>
<ul>
<li><a href="https://tez.apache.org/releases/">Releases Downloads</a></li>
</ul>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tar -zxf /opt/apache-tez-0.9.2-bin.tar.gz
</code></pre></div></div>
<h2 id="tomcat">tomcat</h2>
<ul>
<li><a href="https://tomcat.apache.org/whichversion.html">下载</a>官方 tar.gz 安装包解压即可</li>
</ul>
<h3 id="tez-ui-配置">tez-ui 配置</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># mkdir -p /opt/apache-tomcat-9.0.19/webapps/tez-ui
# cp /opt/apache-tez-0.9.2-bin/tez-ui-0.9.2.war /opt/apache-tomcat-9.0.19/webapps/tez-ui
# cd /opt/apache-tomcat-9.0.19/webapps/tez-ui
# unzip tez-ui-0.9.2.war
# cat config/configs.env
...... 省略 ......
ENV = {
hosts: {
/*
* Timeline Server Address:
* By default TEZ UI looks for timeline server at http://localhost:8188, uncomment and change
* the following value for pointing to a different address.
*/
timeline: "http://192.168.1.12:8188",
/*
* Resource Manager Address:
* By default RM REST APIs are expected to be at http://localhost:8088, uncomment and change
* the following value to point to a different address.
*/
rm: "http://192.168.1.12:8088",
/*
* Resource Manager Web Proxy Address:
* Optional - By default, value configured as RM host will be taken as proxy address
* Use this configuration when RM web proxy is configured at a different address than RM.
*/
//rmProxy: "http://localhost:8088",
},
...... 省略....
</code></pre></div></div>
<p>修改 tomcat server.conf 端口号</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><Connector port="9999" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" />
</code></pre></div></div>
<ul>
<li>启动tomcat:
<ul>
<li>./bin/startup.sh</li>
<li>http://192.168.1.12:9999/tez-ui/ 访问页面</li>
</ul>
</li>
</ul>
<h2 id="tez-sitexml">tez-site.xml</h2>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><property>
<name>tez.history.logging.service.class</name>
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
<name>tez.tez-ui.history-url.base</name>
<value>http://192.168.1.12:9999/tez-ui/</value>
</property>
</code></pre></div></div>
<ul>
<li><a href="https://community.cloudera.com/t5/Community-Articles/How-to-install-Tez-UI-Standalone-and-use-it-to-debug-Hive/ta-p/247345">参考文档</a></li>
</ul>Jevic版本信息: Ambari 2.7.5 HDP-3.1.4.0Oozie-YARN异常YarnException:Failed while publishing entity的解决方案2020-03-30T18:56:06+08:002020-03-30T18:56:06+08:00http://0.0.0.0/2020/03/30/oozie-yarn-error<p>版本: HDP 3.1.4
Mapreduce提交任务计算时,job已经结束,但是容器仍不能关闭持续等待五分钟;</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2020-03-30 21:09:30,393 INFO [Thread-75] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2020-03-30 21:09:30,494 INFO [Thread-75] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2020-03-30 21:09:30,594 INFO [Thread-75] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2020-03-30 21:09:30,694 INFO [Thread-75] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2020-03-30 21:09:30,794 INFO [Thread-75] org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. Thread state is :WAITING
2020-03-30 21:09:30,819 ERROR [Job ATS Event Dispatcher] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Failed to process Event JOB_FINISHED for the job : job_1585569116633_0009
org.apache.hadoop.yarn.exceptions.YarnException: Failed while publishing entity
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:548)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForNewTimelineService(JobHistoryEventHandler.java:1405)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:742)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795)
at org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:539)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.doPutObjects(TimelineV2ClientImpl.java:291)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.access$000(TimelineV2ClientImpl.java:66)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$1.run(TimelineV2ClientImpl.java:302)
at org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$1.run(TimelineV2ClientImpl.java:299)
</code></pre></div></div>
<p><img src="https://raw.githubusercontent.com/jevic/images/master/hdp/hue-oozie01.png" alt="" />
<img src="https://raw.githubusercontent.com/jevic/images/master/hdp/hue-oozie02.png" alt="" />
<img src="https://raw.githubusercontent.com/jevic/images/master/hdp/hue-oozie3.png" alt="" />
<img src="https://raw.githubusercontent.com/jevic/images/master/hdp/hue-oozie03.png" alt="" /></p>
<blockquote>
<p>发生这种情况是因为来自ATSv2的嵌入式HBASE崩溃。
需要重置ATsv2内嵌HBASE数据库</p>
</blockquote>
<h3 id="停止yarn服务">停止Yarn服务</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Ambari - > Yarn-Actions- > Stop
</code></pre></div></div>
<h3 id="删除zookeeper-上的atsv2-znode">删除zookeeper 上的ATSv2 Znode</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[root@ ~]# cd /usr/hdp/3.1.4.0-315/zookeeper/bin
[root@ bin]# ./zkCli.sh
......
[zk: localhost:2181(CONNECTED) 1] rmr /atsv2-hbase-unsecure
</code></pre></div></div>
<h3 id="删除hdfs时间线服务目录内的hbase数据">删除HDFS时间线服务目录内的hbase数据</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[hdfs@s05 ~]$ hdfs dfs -rm -r /atsv2/hbase
</code></pre></div></div>
<h3 id="启动yarn服务">启动Yarn服务</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Ambari - > Yarn-Actions- > Start
</code></pre></div></div>
<h3 id="再次提交服务">再次提交服务</h3>
<p><img src="https://raw.githubusercontent.com/jevic/images/master/hdp/hue-oozie04.png" alt="" /></p>Jevic版本: HDP 3.1.4 Mapreduce提交任务计算时,job已经结束,但是容器仍不能关闭持续等待五分钟;Ambari安装oozie UI无法显示2020-03-17T18:56:06+08:002020-03-17T18:56:06+08:00http://0.0.0.0/2020/03/17/oozie-ui<blockquote>
<p>如下图所示,无法正常显示UI界面</p>
</blockquote>
<p><img src="https://raw.githubusercontent.com/jevic/images/master/hdp/oozie-web.png" alt="" /></p>
<h3 id="1-停止oozie-服务">1. 停止oozie 服务</h3>
<p><img src="https://raw.githubusercontent.com/jevic/images/master/hdp/oozie02.png" alt="" /></p>
<h3 id="2-下载扩展包">2. 下载扩展包</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[root@dt-hdp01 libext]# pwd
/usr/hdp/3.1.4.0-315/oozie/libext
[root@dt-hdp01 libext]# rm -rf ext-2.2 #如果已经存在的删除即可
[root@dt-hdp01 libext]# wget http://archive.cloudera.com/gplextras/misc/ext-2.2.zip
[root@dt-hdp01 libext]# unzip -q ext-2.2.zip
[root@dt-hdp01 libext]# chown oozie.hadoop ext-2.2 -R
</code></pre></div></div>
<h3 id="3-删除旧页面文件">3. 删除旧页面文件</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[root@dt-hdp01 webapps]# pwd
/usr/hdp/current/oozie-server/oozie-server/webapps
[root@dt-hdp01 webapps]# rm -rf oozie oozie.war
</code></pre></div></div>
<h3 id="4-重新编译生成war包">4. 重新编译生成war包</h3>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[root@dt-hdp01 bin]# pwd
/usr/hdp/current/oozie-server/bin
[root@dt-hdp01 bin]# ./oozie-setup.sh prepare-war
</code></pre></div></div>
<p><img src="https://raw.githubusercontent.com/jevic/images/master/hdp/oozie01.png" alt="" /></p>
<h3 id="5-启动oozie访问web页面即可">5. 启动oozie,访问web页面即可</h3>
<h3 id="其他说明">其他说明:</h3>
<p>关于ext-2.2 目录的位置;
根据步骤3 当中所示的软连接地址把下载的包放置到对应位置即可;
其他集群管理工具安装的oozie大致一样;</p>Jevic如下图所示,无法正常显示UI界面