Issue 1: failed to schedule pod for not running “VolumeBinding” filter plugin
describe pod:
1 2 3 4
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 55s (x19 over 26m) default-scheduler error while running "VolumeBinding" filter plugin for pod "eri-cec-9dcd4d6c8-whvrm": pod has unbound immediate PersistentVolumeClaims
NAME: eri LAST DEPLOYED: Wed Apr 8 09:58:00 2020 NAMESPACE: default STATUS: DEPLOYED ====================================== RESOURCES: ==> v1/Deployment NAME AGE eri-cec 1s ====================================== ==> v1/PersistentVolumeClaim NAME AGE eri-cec-database-pvc 2s eri-cec-misc-pvc 2s ====================================== ==> v1/Pod(related) NAME AGE eri-cec-9dcd4d6c8-whvrm 1s ====================================== ==> v1/Secret NAME AGE eri-cec-database-secret 2s ====================================== ==> v1/Service NAME AGE eri-cec 1s ====================================== ==> v1beta1/Ingress NAME AGE eri-cec-ingress 1s
[root@host63 cec-installer]# kubectl apply -f pv-init.yaml persistentvolume/eri-sh-cec-database-pv created persistentvolume/eri-sh-cec-misc-pv created [root@host63 cec-installer]# kubectl get pv -A NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE eri-sh-cec-database-pv 5Gi RWO,RWX Retain Available 15s eri-sh-cec-misc-pv 5Gi RWO,RWX Retain Available 15s
PVC still not work, check its status:
1 2 3 4 5 6 7 8 9
[root@host63 cec-installer]# kubectl describe pvc eri-sh-cec-database-pvc -n cec Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 54m (x3 over 104m) cluster.local/nfs-provisioner-nfs-server-provisioner_nfs-provisioner-nfs-server-provisioner-0_79370fad-78b7-11ea-8b82-66556e93189d failed to provision volume with StorageClass "nfs": error getting NFS server IP for volume: service SERVICE_NAME=nfs-provisioner-nfs-server-provisioner is not valid; check that it has for ports map[{111 UDP}:true {111 TCP}:true {2049 TCP}:true {20048 TCP}:true] exactly one endpoint, this pod's IP POD_IP=192.168.220.144 Warning ProvisioningFailed 38m (x8 over 3h6m) cluster.local/nfs-provisioner-nfs-server-provisioner_nfs-provisioner-nfs-server-provisioner-0_79370fad-78b7-11ea-8b82-66556e93189d failed to provision volume with StorageClass "nfs": error getting NFS server IP for volume: service SERVICE_NAME=nfs-provisioner-nfs-server-provisioner is not valid; check that it has for ports map[{2049 TCP}:true {20048 TCP}:true {111 UDP}:true {111 TCP}:true] exactly one endpoint, this pod's IP POD_IP=192.168.220.144 Normal Provisioning 21m (x16 over 3h6m) cluster.local/nfs-provisioner-nfs-server-provisioner_nfs-provisioner-nfs-server-provisioner-0_79370fad-78b7-11ea-8b82-66556e93189d External provisioner is provisioning volume for claim "cec/eri-sh-cec-database-pvc" Warning ProvisioningFailed 21m (x2 over 3h4m) cluster.local/nfs-provisioner-nfs-server-provisioner_nfs-provisioner-nfs-server-provisioner-0_79370fad-78b7-11ea-8b82-66556e93189d failed to provision volume with StorageClass "nfs": error getting NFS server IP for volume: service SERVICE_NAME=nfs-provisioner-nfs-server-provisioner is not valid; check that it has for ports map[{111 TCP}:true {2049 TCP}:true {20048 TCP}:true {111 UDP}:true] exactly one endpoint, this pod's IP POD_IP=192.168.220.144 Normal ExternalProvisioning 87s (x742 over 3h6m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "cluster.local/nfs-provisioner-nfs-server-provisioner" or manually created by system administrator
Did not figure out
Issue 2. The deployed pod (without nfs) would use ipv4 as default, and service nginx-ingress will use ipv4 address as default route, cannot change it.
[root@host63 cec-installer]# kubectl logs nginx-ingress-controller-64d58897bd-b99gw ------------------------------------------------------------------------------- NGINX Ingress controller Release: 0.29.0 Build: git-eedcdcdbf Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.17.8 ------------------------------------------------------------------------------- I0407 10:17:14.810155 8 flags.go:215] Watching for Ingress class: nginx W0407 10:17:14.811042 8 flags.go:260] SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false) W0407 10:17:14.811123 8 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. I0407 10:17:14.811367 8 main.go:193] Creating API client for https://192.167.0.1:443 I0407 10:17:14.820212 8 main.go:237] Running in Kubernetes cluster version v1.17 (v1.17.4) - git (clean) commit 8d8aa39598534325ad77120c120a22b3a990b5ea - platform linux/amd64 I0407 10:17:14.823302 8 main.go:91] Validated default/nginx-ingress-default-backend as the default backend. I0407 10:17:15.126113 8 main.go:102] SSL fake certificate created /etc/ingress-controller/ssl/default-fake-certificate.pem W0407 10:17:15.147723 8 store.go:657] Unexpected error reading configuration configmap: configmaps "nginx-ingress-controller" not found I0407 10:17:15.156374 8 nginx.go:263] Starting NGINX Ingress controller I0407 10:17:16.357204 8 nginx.go:307] Starting NGINX process I0407 10:17:16.357338 8 leaderelection.go:242] attempting to acquire leader lease default/ingress-controller-leader-nginx... W0407 10:17:16.358186 8 controller.go:394] Service "default/nginx-ingress-default-backend" does not have any active Endpoint I0407 10:17:16.358304 8 controller.go:137] Configuration changes detected, backend reload required. I0407 10:17:16.360127 8 status.go:86] new leader elected: nginx-ingress-controller-64d58897bd-cthrs I0407 10:17:16.450895 8 controller.go:153] Backend successfully reloaded. I0407 10:17:16.450966 8 controller.go:162] Initial sync, sleeping for 1 second. W0407 10:17:20.280746 8 controller.go:394] Service "default/nginx-ingress-default-backend" does not have any active Endpoint W0407 10:17:23.614240 8 controller.go:394] Service "default/nginx-ingress-default-backend" does not have any active Endpoint W0407 10:17:33.458971 8 controller.go:394] Service "default/nginx-ingress-default-backend" does not have any active Endpoint I0407 10:17:53.811527 8 leaderelection.go:252] successfully acquired lease default/ingress-controller-leader-nginx I0407 10:17:53.811566 8 status.go:86] new leader elected: nginx-ingress-controller-64d58897bd-b99gw W0407 10:18:00.868971 8 controller.go:394] Service "default/nginx-ingress-default-backend" does not have any active Endpoint I0408 03:14:30.743173 8 event.go:281] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"cec", Name:"eri-sh-cec-ingress", UID:"dd298fd3-3c16-42e8-a544-c7f942ec4e3e", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"211359", FieldPath:""}): type: 'Normal' reason: 'CREATE' Ingress cec/eri-sh-cec-ingress W0408 03:14:34.068588 8 controller.go:921] Service "default/eri-cec" does not have any active Endpoint. W0408 03:14:34.068631 8 controller.go:921] Service "default/eri-cec" does not have any active Endpoint. W0408 03:14:34.068648 8 controller.go:921] Service "cec/eri-sh-cec" does not have any active Endpoint. W0408 03:14:34.068661 8 controller.go:921] Service "cec/eri-sh-cec" does not have any active Endpoint. I0408 03:14:53.817883 8 status.go:274] updating Ingress cec/eri-sh-cec-ingress status from [] to [{10.136.40.63 }] I0408 03:14:53.820045 8 event.go:281] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"cec", Name:"eri-sh-cec-ingress", UID:"dd298fd3-3c16-42e8-a544-c7f942ec4e3e", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"211445", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress cec/eri-sh-cec-ingress W0408 03:14:53.820285 8 controller.go:921] Service "default/eri-cec" does not have any active Endpoint. W0408 03:14:53.820310 8 controller.go:921] Service "default/eri-cec" does not have any active Endpoint. W0408 03:14:53.820326 8 controller.go:921] Service "cec/eri-sh-cec" does not have any active Endpoint. W0408 03:14:53.820341 8 controller.go:921] Service "cec/eri-sh-cec" does not have any active Endpoint.
Solved the problem of product service using default ipv4 as cluster-ip by adding new paras in helm deployment charts:
ipFamily: below service: in values.yaml
ipFamily:{{.Values.service.ipFamily}}
1 2 3 4 5 6 7 8 9 10 11
below `spec:` in `service.yaml` - `--set service.ipFamily=IPv6` when helm install product
**Solution:** Modify helm chart before installing ingress, in `value.yaml`, config: ```yaml hostNetwork: true reportNodeInternalIp: true daemonset: useHostPort: true kind: DaemonSet
Issue 3. Failed to init tiller, pod creating error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
pod hang on state ContainerCreating after doing helm init. describe pod:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
[root@host59 ~]# kubectl describe -n kube-system pod tiller-deploy-969865475-sn2k2 Name: tiller-deploy-969865475-sn2k2 Namespace: kube-system Node: host59/2001:1b74:88:9400::59:59 Controlled By: ReplicaSet/tiller-deploy-969865475 Containers: tiller: Container ID: Image: gcr.io/kubernetes-helm/tiller:v2.16.1 Image ID: Ports: 44134/TCP, 44135/TCP Events: Type Reason Age From Message Normal Scheduled 54m default-scheduler Successfully assigned kube-system/tiller-deploy-969865475-sn2k2 to host59 Warning FailedCreatePodSandBox 2m47s (x13 over 50m) kubelet, host59 Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded Normal SandboxChanged 2m47s (x13 over 50m) kubelet, host59 Pod sandbox changed, it will be killed and re-created.
check docker containers, which is already running, so there must be something wrong with docker
1 2
[root@host59 ~]# systemctl status kubelet -l Apr 17 17:25:27 host59 kubelet[19205]: E0417 17:25:27.660517 19205 dns.go:135] Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.221.16.11 10.221.16.10 150.236.34.180
1 2 3 4
[root@host59 ~]# journalctl -u kubelet -f Apr 17 17:27:18 host59 kubelet[19205]: E0417 17:27:18.262418 19205 cni.go:385] Error deleting kube-system_tiller-deploy-969865475-sn2k2/f35df2a630d07b0ec7149fb06d7216c60a3c77a7118924c7b7eb9556b02f5cab from network multus/multus-cni-network: netplugin failed with no error message Apr 17 17:27:18 host59 kubelet[19205]: W0417 17:27:18.263092 19205 cni.go:331] CNI failed to retrieve network namespace path: Error: No such container: beb6e83c61bc47ba808dcc51e6c76e89817efb1f518fe28bc1083c99ad4721e1 Apr 17 17:27:19 host59 kubelet[19205]: E0417 17:27:19.660435 19205 dns.go:135] Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.221.16.11 10.221.16.10 150.236.34.180
So the multus pod/container is not running correctly. Check the pod:
1 2 3 4 5 6 7 8
[root@host59 ~]# kubectl describe pod -n kube-system pod kube-multus-ds-amd64-wz5xj Name: kube-multus-ds-amd64-wz5xj Namespace: kube-system Node: host59/2001:1b74:88:9400::59:59 Events: Type Reason Age From Message Warning DNSConfigForming 117s (x291 over 6h7m) kubelet, host59 Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 10.221.16.11 10.221.16.10 150.236.34.180 Error from server (NotFound): pods "pod" not found
Delete pod kube-mutlus:
1
kubectl delete -f multus-daemonset.yml
still not work, edit deployment find:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
[root@host59 opt]# kubectl edit deploy tiller-deploy -n kube-system status: conditions: - lastTransitionTime: "2020-04-20T01:53:55Z" lastUpdateTime: "2020-04-20T01:53:55Z" message: Deployment does not have minimum availability. reason: MinimumReplicasUnavailable status: "False" type: Available - lastTransitionTime: "2020-04-20T02:03:56Z" lastUpdateTime: "2020-04-20T02:03:56Z" message: ReplicaSet "tiller-deploy-b747845f" has timed out progressing. reason: ProgressDeadlineExceeded status: "False" type: Progressing observedGeneration: 1 replicas: 1 unavailableReplicas: 1 updatedReplicas: 1
Finally find the reason: mutlus is not compatible with calico, thus Error deleting kube-system_tiller-deploy... from network multus/multus-cni-network: netplugin failed with no error message happened as above. Even if I delete mutlus before, it has already been configured in etcd config file under /etc/kubernetes. So modify related config and the tiller pod will turn to normal.