The connection to the server lb.kubesphere.local:6443 was refused - did you specify the right host or port?
由于断电停机,kubernetes集群挂掉,使用任意kubectl 命令会报错:The connection to the server ip:6443 was refused - did you specify the right host or port,重启kubelet也不能恢复,etcd读取数据报错,数据文件损坏
ectd 在读取数据时发生了错误,导致启动失败。继而api-server也无法启动
etcd的数据文件损坏了,要做数据恢复,而我这是实验环境,没搞etcd备份就只能重置集群了
注意,线上使用etcd一定要做高可用和定期备份,否则就悲催了
第一步:
检查核心服务是否存活(所有master节点执行)
systemctl status etcd kube-apiserver kube-controller-manager kube-scheduler kubelet containerd
发现ETCD 服务出问题:
systemctl start etcd
然后查看更多日志:
journalctl -u etcd -n 1000
发现错误:
"etcdserver/server.go:518","msg":"failed to recover v3 backend from snapshot","error":"failed to find database snapshot file (snap: snapshot file doesn't exist)","stacktr
panic: failed to recover v3 backend from snapshot
这个错误的意思是: 断电导致数据文件损坏,etcd希望从快照中恢复,但遗憾的是,没有备份文件,因此无法修复
etcd 我是每天都有备份的,
在 /var/backups/kube_etcd 目录里:
把断电当天的备份删除。
etcd备份恢复 :连接
欢迎来撩 : 汇总all