placement brief / Interview Questions / brief / 08 Jun 2026

Kubernetes Scenario Based Questions 2026, 25 Real Troubleshooting Cases

Experienced Kubernetes rounds are troubleshooting marathons. Candidates report interviewers describe a broken pod, service, or rollout and ask you to diagnose...

By Aditya SharmaPublished 8 Jun 20262 sources listedSpot an error? Corrections open

6 min read last revised 8 Jun 2026

on this page§ 06

Experienced Kubernetes rounds are troubleshooting marathons. Candidates report interviewers describe a broken pod, service, or rollout and ask you to diagnose and fix it live. This guide collects 25 real scenarios from candidate-reported rounds and public preparation resources, each with the diagnosis, the fix, and the exact kubectl commands.

The diagnostic toolkit: kubectl describe, kubectl logs, kubectl get events, kubectl exec. Always describe and read events before guessing.

Pod Failure Scenarios (S1 to S9)

S1. A pod is in CrashLoopBackOff. How do you debug it?

Diagnosis: The container starts, crashes, and Kubernetes restarts it with increasing backoff. The crash is in the app.

Fix steps:

kubectl describe pod <pod>     # see events and last state
kubectl logs <pod> --previous  # logs from the crashed instance

Common causes: bad config/env, missing dependency, failing startup connection, or a wrong command. Candidate-reported as the single most common Kubernetes scenario.

S2. A pod is stuck in ImagePullBackOff. Diagnose.

Diagnosis: Kubernetes cannot pull the image: wrong image name/tag, private registry without credentials, or a registry outage.

Fix: Verify the image and tag, add an imagePullSecret for private registries, and check kubectl describe pod events for the exact pull error.

S3. A pod stays Pending. Why?

Diagnosis: The scheduler cannot place it: insufficient CPU/memory on nodes, a node selector/affinity with no match, taints without tolerations, or an unbound PersistentVolumeClaim.

Fix: kubectl describe pod shows the scheduling reason. Add capacity, fix selectors/tolerations, or provision the PVC. Candidate-reported as a frequent scheduling scenario.

S4. A pod is OOMKilled. Fix it.

Diagnosis: The container exceeded its memory limit; kubectl describe pod shows OOMKilled in last state.

Fix: Raise the memory limit if justified, fix the leak, and set the app's runtime memory (JVM heap, Node max-old-space-size) under the limit. Set requests and limits sensibly.

S5. A liveness probe keeps restarting a healthy app. Why?

Diagnosis: The probe is too aggressive, the app boots slower than the probe allows, or the probe targets the wrong path/port.

Fix: Add initialDelaySeconds or a startupProbe, increase timeoutSeconds, and verify the probe endpoint. Candidate-reported as a common probe trap.

S6. What is the difference between liveness, readiness, and startup probes?

S7. A pod runs but gets no traffic. Diagnose.

Diagnosis: The readiness probe is failing (pod not in endpoints), or the Service selector does not match the pod labels.

Fix: Check kubectl get endpoints <svc>; if empty, fix the label selector or the readiness probe.

S8. A pod cannot reach a ConfigMap or Secret value. Why?

Diagnosis: The ConfigMap/Secret was updated but the pod loaded it at start (env vars do not auto-update), or the key name is wrong.

Fix: Restart the pod to pick up env changes, or mount as a volume for files that update. Verify keys with kubectl describe configmap.

S9. Init container blocks pod startup. Explain.

Diagnosis: An init container must complete successfully before app containers start; if it fails or hangs (e.g., waiting for a dependency), the pod stays in Init state.

Fix: Check init container logs, fix the dependency wait, and ensure it exits 0.

Service and Networking Scenarios (S10 to S17)

S10. A Service is not reachable within the cluster. Diagnose.

Diagnosis: Selector mismatch, no ready endpoints, wrong targetPort, or a NetworkPolicy blocking traffic.

Fix: kubectl get endpoints, verify selector and targetPort, and check NetworkPolicies. Candidate-reported as a frequent networking scenario.

S11. What is the difference between ClusterIP, NodePort, and LoadBalancer?

Type	Exposure
ClusterIP	Internal only (default)
NodePort	A static port on each node
LoadBalancer	External via cloud load balancer

Ingress sits above these for HTTP routing with one entry point.

S12. An Ingress returns 404 or 502. Diagnose.

Diagnosis: 404 often means the host/path rule does not match or the backend Service is wrong; 502 means the backend pods are unhealthy or the targetPort is wrong.

Fix: Check the Ingress rules, the backend Service endpoints, and pod readiness. Verify the ingress controller is running.

S13. DNS resolution fails inside pods. Why?

Diagnosis: CoreDNS is down or misconfigured, or the pod uses the wrong DNS policy.

Fix: Check CoreDNS pods in kube-system, test with a debug pod (nslookup), and verify the service name format service.namespace.svc.cluster.local.

S14. A NetworkPolicy broke all traffic. Explain.

Diagnosis: Once any NetworkPolicy selects a pod, traffic not explicitly allowed is denied (default-deny once selected).

Fix: Add explicit allow rules for required ingress/egress, including DNS egress to CoreDNS. Candidate-reported as a security scenario.

S15. How do you debug intermittent connection failures between services?

Diagnosis: Could be a rolling update removing pods, readiness probe flapping, DNS caching, or resource pressure.

Fix: Check rollout status, probe stability, and node resource usage; ensure graceful shutdown with preStop hooks and proper terminationGracePeriodSeconds.

S16. Sticky sessions are needed but requests scatter across pods. Fix.

Diagnosis: Default Service load balancing distributes across endpoints.

Fix: Use sessionAffinity: ClientIP on the Service, or handle session state externally (Redis) so pods are stateless. The stateless approach is preferred.

S17. A new deployment caused downtime. Why?

Diagnosis: Missing readiness probes (traffic sent before pods are ready), or a too-aggressive rolling update maxUnavailable.

Fix: Add readiness probes, tune maxUnavailable/maxSurge, and use proper graceful shutdown. Candidate-reported as a deployment-strategy scenario.

Rollouts, Storage, and Cluster Scenarios (S18 to S25)

S18. A rollout is stuck. How do you investigate and roll back?

kubectl rollout status deployment/<name>
kubectl describe deployment <name>
kubectl rollout undo deployment/<name>

Diagnosis: New pods fail readiness so the rollout pauses. Investigate the new pods' events and logs, then roll back if needed.

S19. A StatefulSet pod will not reschedule after a node failure. Why?

Diagnosis: StatefulSets preserve identity and storage; if the volume cannot reattach (zone mismatch, stuck PVC), the pod stays pending.

Fix: Ensure the storage class and volume can attach in the new node's zone; check the PVC and PV binding.

S20. A PersistentVolumeClaim is stuck Pending. Diagnose.

Diagnosis: No matching PV, no dynamic provisioner/StorageClass, or a capacity/access-mode mismatch.

Fix: Provide a StorageClass with a provisioner, or create a matching PV; verify requested size and access mode.

S21. The cluster is at capacity and pods are Pending. Options?

Diagnosis: Nodes lack resources for pending pods.

Fix: Enable the Cluster Autoscaler to add nodes, right-size requests/limits, and use the Horizontal Pod Autoscaler for app scaling. Candidate-reported as a scaling scenario.

S22. How does the Horizontal Pod Autoscaler work and why might it not scale?

S23. A node is NotReady. How do you triage?

Diagnosis: kubelet down, node resource exhaustion (disk/memory pressure), or network issues. kubectl describe node shows conditions.

Fix: Check kubelet and container runtime on the node, free resources, and cordon/drain if replacing. Pods reschedule once the node recovers or is drained.

S24. Secrets appear base64 but not encrypted. Explain the risk and fix.

Diagnosis: Kubernetes Secrets are base64-encoded, not encrypted at rest by default.

Fix: Enable encryption at rest for etcd, restrict RBAC access to secrets, and consider an external secrets manager. Confirm the managed platform's encryption defaults on the official documentation. Candidate-reported as a security scenario.

S25. Scenario: design a resilient deployment for a stateless web app. Walk through it.

Kubernetes Scenario Mock Test, 2026 Edition

5 original questions calibrated to the 2026 DevOps batch by Aditya Sharma, from candidate-reported patterns.

Question 1

CrashLoopBackOff means:

a) image cannot be pulled b) the container starts, crashes, and restarts repeatedly c) no node available d) network blocked

Solution: Check logs --previous and describe for the crash cause. Answer: (b)

Question 2

A pod stuck Pending is usually a:

a) crash b) scheduling problem (resources, selectors, taints, PVC) c) image issue d) probe failure

Solution: describe pod shows the scheduling reason. Answer: (b)

Question 3

A pod runs but gets no traffic. First check:

a) CPU b) Service endpoints and readiness probe c) disk d) DNS only

Solution: Empty endpoints mean selector or readiness issue. Answer: (b)

Question 4

Kubernetes Secrets by default are:

a) encrypted b) base64-encoded, not encrypted at rest c) plaintext files d) signed

Solution: Enable etcd encryption and restrict RBAC. Answer: (b)

Question 5

The HPA will not scale if:

a) too many nodes b) resource requests are unset so utilization cannot compute c) probes pass d) Ingress exists

Solution: HPA needs requests and a metrics source. Answer: (b)

FAQ, Kubernetes Scenario Questions

Q: How many scenarios appear in a round? Candidate-reported experienced and SRE rounds are scenario-heavy, often four to seven cases.

Q: What kubectl commands should I master? describe, logs (with --previous), get events, get endpoints, exec, and rollout. They are your diagnostic kit.

Q: Do freshers get scenario questions? Lighter ones (CrashLoopBackOff, ImagePullBackOff). Storage, autoscaling, and networking depth skew experienced.

Q: What is the most-missed Kubernetes scenario? Confusing liveness with readiness, and Pending-pod scheduling causes, per candidate-reported feedback.

Sat this this year? Share your story, earn ₹500.

First-person experience reports help future candidates prep smarter. We pay verified contributors ₹500 via UPI per accepted story with byline.

Submit your story →

ready to practice?

Take a free timed mock test

Put what you learned into practice. Our mock tests match the 2026 pattern with timer, navigator, reveal, and score breakdown. No signup.

Start free mock test →

related guides

Interview Questions

Share this guide

Twitter LinkedIn W WhatsApp