Errors encountered in using k8s

2020-04-08

work

This is a record in deploying k8s in work stage.

Issue 1: failed to schedule pod for not running “VolumeBinding” filter plugin

describe pod:

Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  55s (x19 over 26m)  default-scheduler  error while running "VolumeBinding" filter plugin for pod "eri-cec-9dcd4d6c8-whvrm": pod has unbound immediate PersistentVolumeClaims

check pod specs by kubectl edit pod <pod-name>:

volumes:
- name: cec-database
  persistentVolumeClaim:
    claimName: eri-cec-database-pvc
- name: cec-misc
  persistentVolumeClaim:
    claimName: eri-cec-misc-pvc
- name: default-token-pxxg8
  secret:
    defaultMode: 420
    secretName: default-token-pxxg8
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-04-08T01:58:02Z"
    message: 'error while running "VolumeBinding" filter plugin for pod "eri-cec-9dcd4d6c8-whvrm":
      pod has unbound immediate PersistentVolumeClaims'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort

The two persistentVolumeClaims were set during helm install:

helm install -n eri ./cec-release-1.0.0.tgz \
> --set persistence.enabled=true \
> --set persistence.storageClass=nfs \
> --set persistence.database.size=5Gi \
> --set persistence.miscellaneous.size=5Gi \
> --set ingress.cecManager.hostName=dual-test \
> --set ingress.cecApi.hostName=dual-api

NAME:   eri
LAST DEPLOYED: Wed Apr  8 09:58:00 2020
NAMESPACE: default
STATUS: DEPLOYED
======================================
RESOURCES:
==> v1/Deployment
NAME     AGE
eri-cec  1s
======================================
==> v1/PersistentVolumeClaim
NAME                  AGE
eri-cec-database-pvc  2s
eri-cec-misc-pvc      2s
======================================
==> v1/Pod(related)
NAME                     AGE
eri-cec-9dcd4d6c8-whvrm  1s
======================================
==> v1/Secret
NAME                     AGE
eri-cec-database-secret  2s
======================================
==> v1/Service
NAME     AGE
eri-cec  1s
======================================
==> v1beta1/Ingress
NAME             AGE
eri-cec-ingress  1s

=========================solution=========================
Refers:

https://stackoverflow.com/questions/60774220/kubernetes-pod-has-unbound-immediate-persistentvolumeclaims
https://blog.csdn.net/oguro/article/details/96964440
https://blog.csdn.net/liumiaocn/article/details/103388607

https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/
From above info, there is something wrong with PVC (PersistentVolumeClaims), which leaves state “unbound”. The PVC should be bound to certain PV (PersistentVolume) which has enough capacity to hold the binding PVC.
check PVCs and PVs:

[root@host63 cec-installer]# kubectl get pvc -A
NAMESPACE   NAME                      STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cec         eri-sh-cec-database-pvc   Pending                                      nfs            125m
cec         eri-sh-cec-misc-pvc       Pending                                      nfs            125m
default     eri-cec-database-pvc      Pending                                      nfs            3h22m
default     eri-cec-misc-pvc          Pending                                      nfs            3h22m

1 2	[root@host63 cec-installer]# kubectl get pv -A No resources found

There is no PV on current node-63, thus I create two PVs for db-pvc and misc-pvc.
Make a directory for PVs first:

1	[root@host63 mnt]# sudo mkdir /mnt/data

check permission and capacity the PVC need:

[root@host63 cec-installer]# kubectl edit pvc eri-sh-cec-database-pvc -n cec
...
accessModes:
  - ReadWriteOnce
resources:
    requests:
      storage: 5Gi

vi pv-init.yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: eri-sh-cec-database-pv
  labels:
    name: eri-sh-cec-database-pv
spec:
  nfs:
    path: /mnt/data
    server: nfs
  accessModes: ["ReadWriteMany","ReadWriteOnce"]
  capacity:
    storage: 5Gi
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: eri-sh-cec-misc-pv
  labels:
    name: eri-sh-cec-misc-pv
spec:
  nfs:
    path: /mnt/data
    server: nfs
  accessModes: ["ReadWriteMany","ReadWriteOnce"]
  capacity:
    storage: 5Gi

[root@host63 cec-installer]# kubectl apply -f pv-init.yaml
persistentvolume/eri-sh-cec-database-pv created
persistentvolume/eri-sh-cec-misc-pv created
[root@host63 cec-installer]# kubectl get pv -A
NAME                     CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
eri-sh-cec-database-pv   5Gi        RWO,RWX        Retain           Available                                   15s
eri-sh-cec-misc-pv       5Gi        RWO,RWX        Retain           Available                                   15s

PVC still not work, check its status:

[root@host63 cec-installer]# kubectl describe pvc eri-sh-cec-database-pvc -n cec
Events:
  Type     Reason                Age                   From                                                                                                                                Message
  ----     ------                ----                  ----                                                                                                                                -------
  Warning  ProvisioningFailed    54m (x3 over 104m)    cluster.local/nfs-provisioner-nfs-server-provisioner_nfs-provisioner-nfs-server-provisioner-0_79370fad-78b7-11ea-8b82-66556e93189d  failed to provision volume with StorageClass "nfs": error getting NFS server IP for volume: service SERVICE_NAME=nfs-provisioner-nfs-server-provisioner is not valid; check that it has for ports map[{111 UDP}:true {111 TCP}:true {2049 TCP}:true {20048 TCP}:true] exactly one endpoint, this pod's IP POD_IP=192.168.220.144
  Warning  ProvisioningFailed    38m (x8 over 3h6m)    cluster.local/nfs-provisioner-nfs-server-provisioner_nfs-provisioner-nfs-server-provisioner-0_79370fad-78b7-11ea-8b82-66556e93189d  failed to provision volume with StorageClass "nfs": error getting NFS server IP for volume: service SERVICE_NAME=nfs-provisioner-nfs-server-provisioner is not valid; check that it has for ports map[{2049 TCP}:true {20048 TCP}:true {111 UDP}:true {111 TCP}:true] exactly one endpoint, this pod's IP POD_IP=192.168.220.144
  Normal   Provisioning          21m (x16 over 3h6m)   cluster.local/nfs-provisioner-nfs-server-provisioner_nfs-provisioner-nfs-server-provisioner-0_79370fad-78b7-11ea-8b82-66556e93189d  External provisioner is provisioning volume for claim "cec/eri-sh-cec-database-pvc"
  Warning  ProvisioningFailed    21m (x2 over 3h4m)    cluster.local/nfs-provisioner-nfs-server-provisioner_nfs-provisioner-nfs-server-provisioner-0_79370fad-78b7-11ea-8b82-66556e93189d  failed to provision volume with StorageClass "nfs": error getting NFS server IP for volume: service SERVICE_NAME=nfs-provisioner-nfs-server-provisioner is not valid; check that it has for ports map[{111 TCP}:true {2049 TCP}:true {20048 TCP}:true {111 UDP}:true] exactly one endpoint, this pod's IP POD_IP=192.168.220.144
  Normal   ExternalProvisioning  87s (x742 over 3h6m)  persistentvolume-controller                                                                                                         waiting for a volume to be created, either by external provisioner "cluster.local/nfs-provisioner-nfs-server-provisioner" or manually created by system administrator

Did not figure out

Issue 2. The deployed pod (without nfs) would use ipv4 as default, and service `nginx-ingress` will use ipv4 address as default route, cannot change it.

log nginx-ingress:

[root@host63 cec-installer]# kubectl logs nginx-ingress-controller-64d58897bd-b99gw
  -------------------------------------------------------------------------------
  NGINX Ingress controller
    Release:       0.29.0
    Build:         git-eedcdcdbf
    Repository:    https://github.com/kubernetes/ingress-nginx
    nginx version: nginx/1.17.8
  -------------------------------------------------------------------------------
  I0407 10:17:14.810155       8 flags.go:215] Watching for Ingress class: nginx
  W0407 10:17:14.811042       8 flags.go:260] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
  W0407 10:17:14.811123       8 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
  I0407 10:17:14.811367       8 main.go:193] Creating API client for https://192.167.0.1:443
  I0407 10:17:14.820212       8 main.go:237] Running in Kubernetes cluster version v1.17 (v1.17.4) - git (clean) commit 8d8aa39598534325ad77120c120a22b3a990b5ea - platform linux/amd64
  I0407 10:17:14.823302       8 main.go:91] Validated default/nginx-ingress-default-backend as the default backend.
  I0407 10:17:15.126113       8 main.go:102] SSL fake certificate created /etc/ingress-controller/ssl/default-fake-certificate.pem
  W0407 10:17:15.147723       8 store.go:657] Unexpected error reading configuration configmap: configmaps "nginx-ingress-controller" not found
  I0407 10:17:15.156374       8 nginx.go:263] Starting NGINX Ingress controller
  I0407 10:17:16.357204       8 nginx.go:307] Starting NGINX process
  I0407 10:17:16.357338       8 leaderelection.go:242] attempting to acquire leader lease  default/ingress-controller-leader-nginx...
  W0407 10:17:16.358186       8 controller.go:394] Service "default/nginx-ingress-default-backend" does not have any active Endpoint
  I0407 10:17:16.358304       8 controller.go:137] Configuration changes detected, backend reload required.
  I0407 10:17:16.360127       8 status.go:86] new leader elected: nginx-ingress-controller-64d58897bd-cthrs
  I0407 10:17:16.450895       8 controller.go:153] Backend successfully reloaded.
  I0407 10:17:16.450966       8 controller.go:162] Initial sync, sleeping for 1 second.
  W0407 10:17:20.280746       8 controller.go:394] Service "default/nginx-ingress-default-backend" does not have any active Endpoint
  W0407 10:17:23.614240       8 controller.go:394] Service "default/nginx-ingress-default-backend" does not have any active Endpoint
  W0407 10:17:33.458971       8 controller.go:394] Service "default/nginx-ingress-default-backend" does not have any active Endpoint
  I0407 10:17:53.811527       8 leaderelection.go:252] successfully acquired lease default/ingress-controller-leader-nginx
  I0407 10:17:53.811566       8 status.go:86] new leader elected: nginx-ingress-controller-64d58897bd-b99gw
  W0407 10:18:00.868971       8 controller.go:394] Service "default/nginx-ingress-default-backend" does not have any active Endpoint
  I0408 03:14:30.743173       8 event.go:281] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"cec", Name:"eri-sh-cec-ingress", UID:"dd298fd3-3c16-42e8-a544-c7f942ec4e3e", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"211359", FieldPath:""}): type: 'Normal' reason: 'CREATE' Ingress cec/eri-sh-cec-ingress
  W0408 03:14:34.068588       8 controller.go:921] Service "default/eri-cec" does not have any active Endpoint.
  W0408 03:14:34.068631       8 controller.go:921] Service "default/eri-cec" does not have any active Endpoint.
  W0408 03:14:34.068648       8 controller.go:921] Service "cec/eri-sh-cec" does not have any active Endpoint.
  W0408 03:14:34.068661       8 controller.go:921] Service "cec/eri-sh-cec" does not have any active Endpoint.
  I0408 03:14:53.817883       8 status.go:274] updating Ingress cec/eri-sh-cec-ingress status from [] to [{10.136.40.63 }]
  I0408 03:14:53.820045       8 event.go:281] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"cec", Name:"eri-sh-cec-ingress", UID:"dd298fd3-3c16-42e8-a544-c7f942ec4e3e", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"211445", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress cec/eri-sh-cec-ingress
  W0408 03:14:53.820285       8 controller.go:921] Service "default/eri-cec" does not have any active Endpoint.
  W0408 03:14:53.820310       8 controller.go:921] Service "default/eri-cec" does not have any active Endpoint.
  W0408 03:14:53.820326       8 controller.go:921] Service "cec/eri-sh-cec" does not have any active Endpoint.
  W0408 03:14:53.820341       8 controller.go:921] Service "cec/eri-sh-cec" does not have any active Endpoint.

Solved the problem of product service using default ipv4 as cluster-ip by adding new paras in helm deployment charts:

ipFamily: below service: in values.yaml

ipFamily: {{.Values.service.ipFamily}}
1
2
3
4
5
6
7
8
9
10
11
  below `spec:` in `service.yaml`
  - `--set service.ipFamily=IPv6` when helm install product

**Solution:**
Modify helm chart before installing ingress, in `value.yaml`, config:
```yaml
hostNetwork: true
reportNodeInternalIp: true
daemonset:
  useHostPort: true
kind: DaemonSet

you can also set service type and external_IP here.
After helm install, check if you can visit service using hostname via host-ip.
1
2
3
4
5
6
7
8
9
10
[root@host63 cec-installer]# curl http://dual-ipv6:80
<!DOCTYPE html>
<html>
...
</html>
[root@host63 cec-installer]# curl http://dual-ipv4:80
<!DOCTYPE html>
<html>
...
</html>

also check with client UI.

Issue 3. Failed to init tiller, pod creating error: rpc error: code = DeadlineExceeded desc = context deadline exceeded

pod hang on state ContainerCreating after doing helm init. describe pod:

[root@host59 ~]# kubectl describe -n kube-system pod tiller-deploy-969865475-sn2k2
Name:           tiller-deploy-969865475-sn2k2
Namespace:      kube-system
Node:           host59/2001:1b74:88:9400::59:59
Controlled By:  ReplicaSet/tiller-deploy-969865475
Containers:
  tiller:
    Container ID:   
    Image:          gcr.io/kubernetes-helm/tiller:v2.16.1
    Image ID:       
    Ports:          44134/TCP, 44135/TCP
Events:
  Type     Reason                  Age                   From               Message
  Normal   Scheduled               54m                   default-scheduler  Successfully assigned kube-system/tiller-deploy-969865475-sn2k2 to host59
  Warning  FailedCreatePodSandBox  2m47s (x13 over 50m)  kubelet, host59    Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   SandboxChanged          2m47s (x13 over 50m)  kubelet, host59    Pod sandbox changed, it will be killed and re-created.

check docker containers, which is already running, so there must be something wrong with docker

1
2

[root@host59 ~]# systemctl status kubelet -l
Apr 17 17:25:27 host59 kubelet[19205]: E0417 17:25:27.660517   19205 dns.go:135] Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.221.16.11 10.221.16.10 150.236.34.180

[root@host59 ~]# journalctl -u  kubelet -f
Apr 17 17:27:18 host59 kubelet[19205]: E0417 17:27:18.262418   19205 cni.go:385] Error deleting kube-system_tiller-deploy-969865475-sn2k2/f35df2a630d07b0ec7149fb06d7216c60a3c77a7118924c7b7eb9556b02f5cab from network multus/multus-cni-network: netplugin failed with no error message
Apr 17 17:27:18 host59 kubelet[19205]: W0417 17:27:18.263092   19205 cni.go:331] CNI failed to retrieve network namespace path: Error: No such container: beb6e83c61bc47ba808dcc51e6c76e89817efb1f518fe28bc1083c99ad4721e1
Apr 17 17:27:19 host59 kubelet[19205]: E0417 17:27:19.660435   19205 dns.go:135] Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.221.16.11 10.221.16.10 150.236.34.180

So the multus pod/container is not running correctly. Check the pod:

[root@host59 ~]# kubectl describe pod  -n kube-system pod kube-multus-ds-amd64-wz5xj
Name:         kube-multus-ds-amd64-wz5xj
Namespace:    kube-system
Node:         host59/2001:1b74:88:9400::59:59
Events:
  Type     Reason            Age                    From             Message
  Warning  DNSConfigForming  117s (x291 over 6h7m)  kubelet, host59  Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.221.16.11 10.221.16.10 150.236.34.180
Error from server (NotFound): pods "pod" not found

Delete pod kube-mutlus:

1	kubectl delete -f multus-daemonset.yml

still not work, edit deployment find:

[root@host59 opt]# kubectl edit deploy tiller-deploy -n kube-system
status:
  conditions:
  - lastTransitionTime: "2020-04-20T01:53:55Z"
    lastUpdateTime: "2020-04-20T01:53:55Z"
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  - lastTransitionTime: "2020-04-20T02:03:56Z"
    lastUpdateTime: "2020-04-20T02:03:56Z"
    message: ReplicaSet "tiller-deploy-b747845f" has timed out progressing.
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing
  observedGeneration: 1
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

Finally find the reason: mutlus is not compatible with calico, thus Error deleting kube-system_tiller-deploy... from network multus/multus-cni-network: netplugin failed with no error message happened as above. Even if I delete mutlus before, it has already been configured in etcd config file under /etc/kubernetes. So modify related config and the tiller pod will turn to normal.

Issue 1: failed to schedule pod for not running “VolumeBinding” filter plugin

Issue 2. The deployed pod (without nfs) would use ipv4 as default, and service nginx-ingress will use ipv4 address as default route, cannot change it.

Issue 3. Failed to init tiller, pod creating error: rpc error: code = DeadlineExceeded desc = context deadline exceeded

Issue 2. The deployed pod (without nfs) would use ipv4 as default, and service `nginx-ingress` will use ipv4 address as default route, cannot change it.