Kubernetes CKA課程筆記 68

ETCD Backup & Restore — Restoring etcd

ZONGRU Li
14 min readNov 26, 2021

考量到災難發生的狀況,導致了我們失去了在K8s Cluster內自己定義的元件:

K8s元件全部遺失!

模擬上述狀況,並且打算透過前一篇做的etcd store備份檔來還原

先來確認現有環境(也就是第66篇筆記的備份檔裡有哪些元件)執行:

kubectl get all

不太多,因為我上完課後,很多元件都先移除(記得現在畫面內就是前一篇備份的物件了)

這邊原本還可以移除一些deployment模擬(理論上也可以移!),不過沒關係

先嘗試移除service元件如下,執行:

kubectl delete service {service名稱1} {service名稱2} {service名稱3}

再次確認得到:

東西更少了!

另外我們還可以移除自己建的(secret/configMap)

同理執行configMap的移除:

以上我就移除了很多字定義的元件了,就好像下面的狀況:

K8s元件遺失!

但是現在我們有:

K8s元件還原 — 利用etcd store backup做復原

可以參考官網doc:

link

步驟1:將前篇(筆記66)做的備份檔,還原為etcd server運行使用的檔案

這邊直接透過指令用前一篇的備份檔建立還原的etcd store,執行:

ETCDCTL_API=3 etcdctl snapshot restore {前篇的備份檔絕對路徑} --data-dir {新的etcd store位置(亦即將備份檔還原成etcd server使用的檔案)}

參考:

我們要從備份檔(/tmp/etcdbackup.db)建立出還原後

提供給etcd server使用的資料原檔

用來取代舊的(假設損毀後的/var/lib/etcd目錄檔案)

所以建立還原點指令為:

ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcdbackup.db --data-dir /var/lib/etcdbackup
錯誤!!要吃root!

指令再調整為:

sudo ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcdbackup.db --data-dir /var/lib/etcdbackup

我這邊發現會失敗(跟講師不一樣,講師這樣執行就成功了)

並且實際上是有建出/var/lib/etcdbackup目錄的

後來爬文找到這篇:

查看etcdctlrestore API,執行:

ETCDCTL_API=3 etcdctl snapshot restore --help

嘗試就直接忽略吧,後面也是說從目錄還原需要用這個option

所以先移除剛剛建立時有錯誤訊息的/var/lib/etcdbackup目錄

重新執行復原指令,執行:

sudo ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcdbackup.db --data-dir /var/lib/etcdbackup --skip-hash-check true
還是錯!

後來最終還原指令改為(是的,只差一個等號...):

sudo ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcdbackup.db --data-dir /var/lib/etcdbackup --skip-hash-check=true
終於看起來有成功還原備份檔,成為可以讓etcd server使用的目錄檔案!!

確認一下該還原後的目錄:

步驟2:將目前etcd Server使用儲存的檔案,指向還原後的目錄:

簡單的方式就是直接改動/etc/kubernetes/manifest/etcd.yaml

因為前面有學到,kubelet會一直監視/etc/kubernetes/manifest目錄

底下的元件yaml檔的異動,並執行其異動內容,使其生效,所以:

改好後儲存!

然後會發現剛改好後kubectl指令會卡住(因為etcd Pod正在重整):

要等一陣子(我差不多畫好這張圖,回頭看就變下面這樣了!)這是因為API-server要跟etcd溝通導致卡住!

步驟3.當然就是確認之前做的k8s元件(最上面我移除的那幾個)是否還原了

連剛剛移除又自動產生的jenkins-token還特地變回舊的

以上就完成了遺失的K8s元件復原了!!!

後來結束前發現:

etcd變成Pending

查看logs,執行:

kubectl logs etcd-master -n kube-system

看到:

我嘗試還原到路徑:/var/lib/etcd後會看到的log會是如下:

在錯誤的還原檔下,也就是etcd Pod還是Pending情況下建立Pod會整個卡住:

所以最後先暫時還原回到原本的/var/lib/etcd目錄:

其中擷取卡住的etcd Pod的部分log如下:

2021-11-26 14:52:48.176036 N | etcdmain: the server is already initialized as member before, starting as etcd member...
[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
2021-11-26 14:52:48.176102 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file =
2021-11-26 14:52:48.176896 I | embed: name = master
2021-11-26 14:52:48.176911 I | embed: data dir = /var/lib/etcd
2021-11-26 14:52:48.176916 I | embed: member dir = /var/lib/etcd/member
2021-11-26 14:52:48.176920 I | embed: heartbeat = 100ms
2021-11-26 14:52:48.176924 I | embed: election = 1000ms
2021-11-26 14:52:48.176933 I | embed: snapshot count = 10000
2021-11-26 14:52:48.176942 I | embed: advertise client URLs = https://172.31.32.35:2379
2021-11-26 14:52:48.176947 I | embed: initial advertise peer URLs = https://172.31.32.35:2380
2021-11-26 14:52:48.176953 I | embed: initial cluster =
2021-11-26 14:52:48.186945 I | etcdserver: recovered store from snapshot at index 1
2021-11-26 14:52:48.187788 I | mvcc: restore compact to 4680451
2021-11-26 14:52:48.200563 I | etcdserver: restarting member 8e9e05c52164694d in cluster cdf818194e3a8c32 at commit index 2274
raft2021/11/26 14:52:48 INFO: 8e9e05c52164694d switched to configuration voters=(10276657743932975437)
raft2021/11/26 14:52:48 INFO: 8e9e05c52164694d became follower at term 2
raft2021/11/26 14:52:48 INFO: newRaft 8e9e05c52164694d [peers: [8e9e05c52164694d], term: 2, commit: 2274, applied: 1, lastindex: 2274, lastterm: 2]
2021-11-26 14:52:48.202507 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32 from store
2021-11-26 14:52:48.203946 W | auth: simple token is not cryptographically signed
2021-11-26 14:52:48.205743 I | mvcc: restore compact to 4680451
2021-11-26 14:52:48.211095 I | etcdserver: starting server... [version: 3.4.13, cluster version: to_be_decided]
2021-11-26 14:52:48.211495 I | etcdserver: 8e9e05c52164694d as single-node; fast-forwarding 9 ticks (election ticks 10)
2021-11-26 14:52:48.212001 N | etcdserver/membership: set the initial cluster version to 3.4
2021-11-26 14:52:48.212121 I | etcdserver/api: enabled capabilities for version 3.4
2021-11-26 14:52:48.213965 I | embed: ClientTLS: cert = /etc/kubernetes/pki/etcd/server.crt, key = /etc/kubernetes/pki/etcd/server.key, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file =
2021-11-26 14:52:48.214306 I | embed: listening for metrics on http://127.0.0.1:2381
2021-11-26 14:52:48.214640 I | embed: listening for peers on 172.31.32.35:2380
raft2021/11/26 14:52:48 INFO: 8e9e05c52164694d is starting a new election at term 2
raft2021/11/26 14:52:48 INFO: 8e9e05c52164694d became candidate at term 3
raft2021/11/26 14:52:48 INFO: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 3
raft2021/11/26 14:52:48 INFO: 8e9e05c52164694d became leader at term 3
raft2021/11/26 14:52:48 INFO: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 3
2021-11-26 14:52:48.304019 I | etcdserver: published {Name:master ClientURLs:[https://172.31.32.35:2379]} to cluster cdf818194e3a8c32
2021-11-26 14:52:48.304142 I | embed: ready to serve client requests
2021-11-26 14:52:48.304686 I | embed: ready to serve client requests
2021-11-26 14:52:48.307152 I | embed: serving client requests on 127.0.0.1:2379
2021-11-26 14:52:48.308149 I | embed: serving client requests on 172.31.32.35:2379
2021-11-26 14:55:12.351618 I | mvcc: store.index: compact 4680915
2021-11-26 14:55:12.352863 I | mvcc: finished scheduled compaction at 4680915 (took 830.881µs)

有問題的如上粗體字(待研究!!)

參考課程(reference)

--

--

ZONGRU Li
ZONGRU Li

Written by ZONGRU Li

2022/11/17 開源部分個人筆記給LINE "Java程式語言討論區"社群,希望能對社群的技術學習做一點點貢獻.(掩面....記得退訂閱!

No responses yet