PapersAdda
2026 Placement Season is LIVE12,000+ students preparing now

Kubernetes Interview Questions 2026 — Top 50 with Expert Answers

38 min read
Interview Questions
Last Updated: 30 Mar 2026
Verified by Industry Experts
3,909 students found this helpful
Advertisement Placement

Kubernetes engineers command ₹25-60 LPA in India. Platform engineers with deep K8s expertise at Flipkart, Swiggy, and Zerodha pull ₹50 LPA-1 Cr. Here are the exact 50 questions standing between you and that offer.

The difference between a ₹12 LPA DevOps role and a ₹45 LPA Platform Engineering role often comes down to how well you answer these specific Kubernetes questions. Cloud-native/K8s roles grew 340% in India since 2023 — get in while demand massively outpaces supply.

This guide covers 50 battle-tested questions compiled from 150+ real interviews at Google, Amazon, Flipkart, Swiggy, Razorpay, PhonePe, and CRED — from fundamentals to production troubleshooting scenarios that only senior engineers survive.

Related: Docker Interview Questions 2026 | Microservices Interview Questions 2026 | Golang Interview Questions 2026


Beginner-Level Kubernetes Questions (Q1–Q15)

Q1. What is Kubernetes and why do we need it?

Why you need it over plain Docker:

ProblemKubernetes Solution
"Which host runs my container?"Scheduler places pods on nodes automatically
Container crashed, need restartSelf-healing — restarts failed pods
Need more replicas for trafficHPA scales pods automatically
How do pods talk to each other?Services provide stable DNS + load balancing
Deploying new version without downtimeRolling updates + rollback
Configuration managementConfigMaps and Secrets
Storage for stateful appsPersistentVolumes

Kubernetes decouples applications from infrastructure, enabling teams to deploy anywhere (on-prem, AWS EKS, GKE, AKS) using the same YAML manifests.


Q2. Explain the Kubernetes architecture — control plane and worker nodes.

Control Plane (master)
├── kube-apiserver — REST API, single entry point for all operations
├── etcd — Distributed key-value store; source of truth for cluster state
├── kube-scheduler — Assigns pods to nodes based on resources, affinity, taints
├── kube-controller-manager — Runs controllers (ReplicaSet, Node, Job, etc.)
└── cloud-controller-manager — Integrates with cloud provider (AWS, GCP)

Worker Node
├── kubelet — Node agent; ensures containers match pod specs
├── kube-proxy — Network rules (iptables/IPVS) for Service routing
├── Container Runtime — containerd, CRI-O (Docker removed in 1.24)
└── Pods — Smallest deployable units

The API server is the only component that talks to etcd. All other components communicate through the API server. In production, the control plane has 3+ nodes for HA with etcd using Raft consensus.

Asked by: Google, Amazon, Flipkart, Swiggy SRE rounds


Q3. What is a Pod? How is it different from a container?

  • Share the same network namespace (same IP, can communicate via localhost)
  • Share the same Linux namespaces (UTS, IPC)
  • Can share volumes mounted in the pod spec

Single-container pods are most common. Multi-container pods are used for:

  • Sidecar: Logging agent (Fluentd), service mesh proxy (Envoy/Istio)
  • Init containers: Run setup tasks before main container starts
  • Ambassador: Proxy for external services
  • Adapter: Normalize output formats
apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  initContainers:
  - name: wait-for-db
    image: busybox
    command: ['sh', '-c', 'until nslookup postgres; do sleep 2; done']
  containers:
  - name: app
    image: myapp:v1.2
    ports:
    - containerPort: 8080
  - name: log-shipper
    image: fluent/fluentd:v1.16

Q4. What are the differences between a Deployment, StatefulSet, and DaemonSet?

FeatureDeploymentStatefulSetDaemonSet
Use caseStateless appsStateful apps (databases)One pod per node
Pod identityRandom names (myapp-7b9c-x)Stable (myapp-0, myapp-1)
Scaling orderAny orderOrdered (0, 1, 2...)
StorageShared or ephemeralStable PVC per pod
Rolling updateAny orderOrdered, one at a timeAll nodes
ExamplesAPI servers, web appsKafka, Cassandra, ZookeeperPrometheus node-exporter, fluentd, Calico

StatefulSets give each pod a stable hostname: myapp-0.myapp-service.namespace.svc.cluster.local

DaemonSet pods are scheduled on every node (or matching nodes via nodeSelector/affinity). When a new node joins the cluster, DaemonSet controller automatically creates a pod on it.

Asked by: Amazon, Flipkart, Atlassian


Q5. What are the types of Kubernetes Services?

TypeDescriptionUse Case
ClusterIPInternal-only VIP; not accessible outside clusterMicroservice-to-microservice communication
NodePortExposes service on each node's IP at a static port (30000-32767)Development, testing
LoadBalancerProvisions a cloud load balancer (ALB, GCP LB)Production external traffic
ExternalNameMaps service to an external DNS name (returns CNAME)External database, third-party services
HeadlessNo cluster IP; returns pod IPs directly via DNSStatefulSets, direct pod addressing
# NodePort example
apiVersion: v1
kind: Service
metadata:
  name: myapp-service
spec:
  type: NodePort
  selector:
    app: myapp
  ports:
  - port: 80        # ClusterIP port
    targetPort: 8080 # Pod's container port
    nodePort: 31000  # Node's exposed port

Services use label selectors to route to matching pods — this is the basis of dynamic service discovery.


Q6. What is the difference between Ingress and a Service of type LoadBalancer?

FeatureService: LoadBalancerIngress
Cost1 cloud LB per service1 LB for all services
RoutingPort-based onlyHTTP path/host-based routing
SSL terminationNo (or at LB level)Yes (cert-manager integration)
L7 featuresNoYes (rate limiting, auth, rewrites)
When to useNon-HTTP protocols, simple single serviceMultiple HTTP services behind one IP

Ingress requires an Ingress Controller (nginx-ingress, AWS ALB Controller, Traefik, Istio Gateway) — Kubernetes doesn't include one by default.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /users
        pathType: Prefix
        backend:
          service:
            name: users-service
            port:
              number: 80
      - path: /orders
        pathType: Prefix
        backend:
          service:
            name: orders-service
            port:
              number: 80

Q7. What is a ConfigMap and Secret? How do you use them?

ConfigMap stores non-sensitive configuration data as key-value pairs. Secret stores sensitive data (encoded in base64 by default — NOT encrypted; use Sealed Secrets or external secret managers for true encryption).

Three ways to consume:

  1. Environment variables: env.valueFrom.configMapKeyRef
  2. Volume mount: Mount as files in container filesystem
  3. Command-line args: Reference in args
# Mount Secret as environment variable
env:
- name: DB_PASSWORD
  valueFrom:
    secretKeyRef:
      name: db-secret
      key: password

Important: Secrets are base64-encoded, not encrypted. In production, use:

  • Kubernetes Secrets with etcd encryption at rest (KMS provider)
  • Sealed Secrets (Bitnami) — encrypted secrets stored in Git
  • External Secrets Operator — sync from AWS Secrets Manager, Vault, GCP Secret Manager

Asked at every K8s interview. Know the security caveats.


Q8. What are resource requests and limits in Kubernetes?

  • Request: Minimum guaranteed resources for scheduling. The scheduler uses requests to find a node with sufficient resources. Pod is guaranteed this amount.
  • Limit: Maximum resources a container can use. If CPU exceeds limit, the container is throttled. If memory exceeds limit, the container is OOM-killed and restarted.
resources:
  requests:
    memory: "256Mi"
    cpu: "250m"    # 250 millicores = 0.25 CPU
  limits:
    memory: "512Mi"
    cpu: "500m"

QoS Classes (Quality of Service):

  • Guaranteed: requests == limits for all containers. Evicted last.
  • Burstable: requests < limits or some containers have no limits. Middle priority.
  • BestEffort: No requests or limits set. Evicted first under pressure.

Set requests carefully: too low → scheduled on over-committed node → OOM; too high → resource waste, poor bin-packing.


Q9. What is a Namespace in Kubernetes?

Default namespaces:

  • default: Where resources go if you don't specify
  • kube-system: System components (coreDNS, metrics-server, kube-proxy)
  • kube-public: Publicly readable (cluster-info)
  • kube-node-lease: Node heartbeat objects

What namespaces do NOT isolate: Node-level resources, cluster-wide resources (PersistentVolumes, StorageClasses, ClusterRoles, Nodes). Network traffic — pods in different namespaces can still talk to each other unless NetworkPolicies restrict it.

DNS for cross-namespace service access: service-name.namespace.svc.cluster.local


Q10. How does Kubernetes networking work? (CNI, Pod IP, DNS)

  1. Every pod gets a unique IP address
  2. Pods can communicate with any other pod without NAT
  3. Nodes can communicate with pods without NAT

CNI (Container Network Interface) plugins implement these rules:

CNI PluginFeaturesUse Case
CalicoBGP routing, NetworkPolicy, WireGuard encryptionProduction, on-prem
FlannelSimple overlay (VXLAN)Development, simple setups
CiliumeBPF-based, L7 NetworkPolicy, Hubble observabilityHigh-performance, service mesh replacement
AWS VPC CNIEC2 ENI-native IPs for podsEKS — native VPC routing
Weave NetMesh overlayMulti-cloud

DNS (CoreDNS): Every pod has /etc/resolv.conf pointing to the CoreDNS cluster IP. Services are discoverable at <service>.<namespace>.svc.cluster.local.


Q11. What is a PersistentVolume and PersistentVolumeClaim?

  • PersistentVolume (PV): A piece of storage in the cluster provisioned by an admin or dynamically via StorageClass. Independent of pod lifecycle.
  • PersistentVolumeClaim (PVC): A request for storage by a user (namespace-scoped). Binds to a matching PV.
  • StorageClass: Defines the "type" of storage and provisioner (AWS EBS, GCP PD, NFS). Enables dynamic provisioning.
# PVC requesting 10GB SSD
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
spec:
  accessModes:
    - ReadWriteOnce   # RWO: single node, RWX: multi-node
  storageClassName: gp3  # AWS EBS gp3
  resources:
    requests:
      storage: 10Gi

Access modes: ReadWriteOnce (RWO) — most block storage (EBS). ReadWriteMany (RWX) — NFS, EFS, CephFS. ReadOnlyMany (ROX) — multiple readers.


Q12. What is RBAC in Kubernetes?

Objects:

  • Role: Namespace-scoped permissions (e.g., can get/list pods in "production" namespace)
  • ClusterRole: Cluster-scoped permissions (can read nodes, PVs) or re-used across namespaces
  • RoleBinding: Grants a Role to a subject in a namespace
  • ClusterRoleBinding: Grants a ClusterRole to a subject cluster-wide
# Role: allow reading pods in production namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
---
# Bind to a user
kind: RoleBinding
metadata:
  name: read-pods-binding
  namespace: production
subjects:
- kind: User
  name: alice
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Q13. What are Kubernetes Probes? Explain liveness, readiness, and startup probes.

ProbePurposeFailure Action
LivenessIs the container still running?Restart the container
ReadinessIs the container ready to serve traffic?Remove from Service endpoints (stop sending traffic)
StartupHas the container finished starting up?Don't run liveness/readiness until startup succeeds
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

Common mistakes:

  • No initialDelaySeconds on liveness → container killed before JVM/Python starts → infinite restart loop
  • Same endpoint for liveness and readiness → if DB is slow, pod is killed instead of just removed from LB
  • No readiness probe → traffic sent to pods that aren't ready yet after deployment

Asked at every company. Know all three and their differences.


Q14. What is Helm? What are its main components?

Core components:

  • Chart: Package of K8s manifests + templates + Chart.yaml metadata + default values.yaml
  • Release: A deployed instance of a chart in the cluster
  • Repository: Where charts are stored (Artifact Hub, private repos)
# Install a chart
helm install my-release bitnami/postgresql \
  --set auth.postgresPassword=secret \
  --set primary.persistence.size=50Gi \
  --namespace production

# Upgrade
helm upgrade my-release bitnami/postgresql --set image.tag=15.3.0

# Rollback
helm rollback my-release 1

Helm 3 (current) removed Tiller — helm now communicates directly with the Kubernetes API using your local kubeconfig. Charts are rendered client-side.


Q15. What is the difference between kubectl apply and kubectl create?

CommandBehaviorWhen to use
kubectl createCreates a new resource; fails if it existsOne-time creation, imperative
kubectl applyCreates or updates; idempotent; uses 3-way mergeGitOps, IaC — always use this
kubectl replaceReplaces entire resource specFull replacement
kubectl patchPartially updates a resource (JSON patch/strategic merge)Quick targeted changes

kubectl apply stores the last-applied configuration in an annotation (kubectl.kubernetes.io/last-applied-configuration) to perform 3-way merges on subsequent applies.


Checkpoint: Master Q1-Q15 and you've cleared the screening round at most companies. The intermediate section below is where ₹20+ LPA offers are won or lost.

Proven Intermediate Kubernetes Questions (Q16–Q35)

Q16. How does Horizontal Pod Autoscaler (HPA) work?

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"

HPA checks metrics every 15 seconds (default). Scale-up: immediate if above threshold for 3 cycles. Scale-down: waits 5 minutes (stabilization window) to prevent thrashing.

KEDA (Kubernetes Event-Driven Autoscaling) extends HPA to scale based on queue length (SQS, Kafka), cron schedules, or custom external metrics. Can scale to zero.


Q17. What is Vertical Pod Autoscaler (VPA)? How does it differ from HPA?

FeatureHPAVPA
What it scalesNumber of replicasCPU/memory requests of pods
When to useStateless, horizontally scalable appsStateful apps, uncertain right-sizing
DowntimeNo (adds/removes pods)Yes — must restart pods to change requests
InteractionDon't use HPA + VPA on same deployment for CPUUse HPA for replicas + VPA in "Off" mode for recommendations

VPA has three modes:

  • Off: Recommendations only (view via kubectl describe vpa)
  • Initial: Set requests on new pods only (no existing pod restarts)
  • Auto: Evict and recreate pods with new requests (causes downtime)

Q18. Explain Kubernetes rolling updates and rollback.

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1         # Max extra pods above desired during update
    maxUnavailable: 0   # Zero-downtime: never kill old pod before new is ready

Rolling update flow:

  1. Create 1 new pod with v2 (surge pod)
  2. Wait for v2 pod to pass readiness probe
  3. Remove 1 old v1 pod
  4. Repeat until all pods are v2

Rollback:

# Check rollout history
kubectl rollout history deployment/myapp

# Rollback to previous version
kubectl rollout undo deployment/myapp

# Rollback to specific revision
kubectl rollout undo deployment/myapp --to-revision=3

# Monitor rollout
kubectl rollout status deployment/myapp

Kubernetes retains rollout history (default 10 revisions via revisionHistoryLimit).


Q19. What are Taints and Tolerations? How do they differ from Node Affinity?

FeatureTaints + TolerationsNode Affinity
DirectionNode repels pods (unless tolerated)Pod attracts to nodes
Default behaviorPods without toleration are rejectedPods can go anywhere unless required
EffectsNoSchedule, PreferNoSchedule, NoExecuterequiredDuringScheduling, preferredDuringScheduling
Use caseDedicated nodes (GPU, spot, tainted for infra)Topology awareness, hardware requirements
# Taint a node for GPU workloads only
kubectl taint nodes gpu-node-1 dedicated=gpu:NoSchedule

# Pods that need GPU must tolerate it
tolerations:
- key: "dedicated"
  operator: "Equal"
  value: "gpu"
  effect: "NoSchedule"

NoExecute taint evicts already-running pods that don't tolerate it. Used when draining nodes for maintenance.


Q20. What is a NetworkPolicy? Write one that isolates a namespace.

# Deny all ingress traffic to pods in 'production' namespace
# except from pods in the same namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-external-ingress
  namespace: production
spec:
  podSelector: {}  # Applies to all pods in namespace
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector: {}          # From pods in same namespace
    - namespaceSelector:       # Or from monitoring namespace
        matchLabels:
          name: monitoring

Important: NetworkPolicies require a CNI that enforces them. Flannel does NOT. Calico, Cilium, and Weave do.

A pod with no NetworkPolicy allowing ingress will deny all inbound traffic once any NetworkPolicy selects it.


Q21. What is an Operator in Kubernetes? Give a real example.

How operators work:

  1. Define a CRD (Custom Resource Definition) — extends the K8s API with new resource types
  2. Implement a Controller — watches for changes to the custom resource and reconciles actual state to desired state

Real example — Prometheus Operator:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-monitor
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
  - port: metrics
    interval: 15s

Instead of manually configuring Prometheus scrape configs, you create a ServiceMonitor CR. The operator watches for these and automatically updates Prometheus configuration.

Other examples: Strimzi (Kafka), Zalando Postgres Operator, Argo CD, cert-manager, Vault Operator.


Q22. What is etcd? What happens if etcd goes down?

If etcd goes down:

  • Existing pods continue running (kubelet keeps running pods independently)
  • New pods cannot be scheduled
  • API server becomes read-only or unresponsive
  • No new deployments, scaling, or configuration changes
  • DNS changes (new services) won't propagate

HA etcd setup: 3 or 5 nodes (always odd — Raft requires (n+1)/2 quorum). 5-node cluster tolerates 2 node failures.

Backup etcd:

ETCDCTL_API=3 etcdctl snapshot save snapshot.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key

Critical question at SRE and platform engineering interviews


Q23. How do you perform zero-downtime deployments in Kubernetes?

  1. readinessProbe configured — new pods only receive traffic when ready
  2. maxUnavailable: 0 in rolling update strategy
  3. preStop lifecycle hook — graceful shutdown with sleep before SIGTERM:
lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 5"]
terminationGracePeriodSeconds: 30

Why preStop sleep? There's a race condition between kube-proxy removing the pod from iptables rules and the pod actually stopping. The 5-second sleep ensures no new connections are routed to the pod after it starts shutting down.

  1. Application must handle SIGTERM gracefully — finish in-flight requests, close DB connections
  2. PodDisruptionBudget (PDB) — prevents too many pods being unavailable simultaneously:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  minAvailable: 2  # or maxUnavailable: 1
  selector:
    matchLabels:
      app: myapp

Q24. What is the difference between Kubernetes Cluster Autoscaler and Karpenter?

FeatureCluster AutoscalerKarpenter
Cloud supportMulti-cloud (ASG-based)AWS-native (EC2 direct, no ASG required)
Provisioning speed3-5 minutesUnder 60 seconds
Node selectionPicks from predefined node groupsProvisions exact right-sized instances
Bin-packingGoodExcellent (pods-first, then provision)
Spot handlingSupports spot ASGsNative spot + on-demand fallback
ConsolidationScale-down onlyActive consolidation (replaces wasteful nodes)

Karpenter (open-sourced by AWS, now CNCF) is the modern replacement for Cluster Autoscaler on AWS. It provisions nodes within seconds and can select any instance type that fits the pending pod's requirements.

Increasingly asked at AWS/EKS-focused interviews (2026)


Q25. What is Pod Anti-Affinity? Give a production use case.

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchLabels:
          app: myapp
      topologyKey: "kubernetes.io/hostname"  # Different nodes

Production use case: Your payment service has 3 replicas. Without anti-affinity, K8s might schedule all 3 on the same node. If that node fails, your payment service is down. With pod anti-affinity on hostname, each replica lands on a different node — tolerate a node failure without any downtime.

For zone-level HA: use topologyKey: "topology.kubernetes.io/zone" — each replica in a different AZ.

Topology Spread Constraints (newer, recommended over anti-affinity):

topologySpreadConstraints:
- maxSkew: 1
  topologyKey: topology.kubernetes.io/zone
  whenUnsatisfiable: DoNotSchedule
  labelSelector:
    matchLabels:
      app: myapp

Q26. How do you debug a pod stuck in CrashLoopBackOff?

Debugging steps:

# 1. Check pod events and status
kubectl describe pod <pod-name> -n <namespace>

# 2. Check current logs (if container just started)
kubectl logs <pod-name> -n <namespace>

# 3. Check previous container's logs (the one that crashed)
kubectl logs <pod-name> -n <namespace> --previous

# 4. Check exit code
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'

Common exit codes:

  • 0: Process exited normally (but not as expected — misconfiguration?)
  • 1: Application error
  • 137: OOM killed (memory limit exceeded) → increase memory limit
  • 139: Segfault
  • 143: SIGTERM (graceful termination signal) — if this loops, app isn't shutting down correctly
# 5. Override entrypoint to debug
kubectl run debug --image=myapp:v1 --command -- sleep 3600
kubectl exec -it debug -- sh

Q27. What is Istio? What problems does a service mesh solve?

Problems service mesh solves:

ProblemWithout Service MeshWith Istio
mTLS between servicesManual cert management per appAutomatic, transparent mTLS
Distributed tracingEach app must integrate Zipkin/Jaeger SDKAutomatic trace propagation via headers
Traffic managementRequires app code changesVirtual Services, Destination Rules
Circuit breakingLibrary per language (Hystrix, Resilience4j)Built into Envoy
ObservabilityEach team instruments separatelyUnified metrics, logs, traces

Istio resources:

  • VirtualService: Traffic routing rules (canary, retries, timeouts, fault injection)
  • DestinationRule: Load balancing, circuit breaking, connection pool settings
  • Gateway: Manage ingress/egress traffic
  • PeerAuthentication: Enforce mTLS between services

Asked at Swiggy, Flipkart, and companies running 100+ microservices


Q28. How does Kubernetes handle secrets security? What are the best practices?

  • Stored in etcd as base64-encoded (NOT encrypted)
  • Accessible to anyone with RBAC get/list on secrets in that namespace
  • Visible in pod environment variables via kubectl exec

Security best practices:

  1. Encrypt etcd at rest: Configure EncryptionConfiguration with AES-CBC or KMS provider
  2. Use Sealed Secrets (Bitnami): Encrypt secrets with cluster public key → safe to commit to Git
  3. External Secrets Operator: Sync from AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager
  4. Vault Agent Injector: Vault sidecar injects secrets directly into pod filesystem
  5. Restrict RBAC: No wildcard get/list on secrets. Minimize who can read secrets.
  6. Audit logs: Enable K8s audit policy to log secret access events
  7. CSI Secret Store driver: Mount secrets as volumes from external stores without storing in K8s

Q29. What is Kustomize and how does it differ from Helm?

FeatureHelmKustomize
ApproachTemplate engine with valuesPatch-based overlays (no templates)
ComplexityHigher (Go templates, Sprig)Lower (pure YAML patching)
Built into kubectlNoYes (since kubectl 1.14)
DependenciesChart dependenciesBases and overlays
Secret encryptionExternal (Helm secrets plugin)External (Sealed Secrets)
Best forThird-party apps with complex configsInternal apps with env-specific overrides
kustomize/
├── base/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── kustomization.yaml
├── overlays/
│   ├── staging/
│   │   ├── kustomization.yaml  # patches image tag, replicas
│   └── production/
│       └── kustomization.yaml  # patches resource limits, replicas

Many teams use both: Helm for third-party charts (postgres, redis), Kustomize for in-house applications.


Q30. How does Pod scheduling work? What is the scheduler algorithm?

Phase 1 — Filtering (predicates): Eliminate nodes that don't meet requirements:

  • NodeUnschedulable (cordoned nodes)
  • ResourceFit (requested CPU/memory <= allocatable)
  • NodeAffinity rules
  • TaintToleration
  • PodAffinity/Anti-Affinity
  • Volume binding (zone-local PV)

Phase 2 — Scoring (priorities): Rank remaining nodes 0–100:

  • LeastRequestedPriority: Prefer nodes with most free resources
  • BalancedResourceAllocation: Balance CPU and memory usage
  • InterPodAffinityPriority: Prefer nodes near affinity-matching pods
  • ImageLocalityPriority: Prefer nodes that already have the container image

The node with the highest score wins. Ties are broken randomly.

Custom schedulers can be registered alongside the default scheduler using the scheduler name field in pod spec.


Q31. Explain Kubernetes resource quotas and LimitRange.

ResourceQuota: Limits total resource consumption in a namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"       # Total CPU requests across all pods
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    pods: "50"
    services.loadbalancers: "5"

LimitRange: Sets default requests/limits for pods that don't specify them, and enforces min/max per container:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
spec:
  limits:
  - type: Container
    default:
      cpu: 500m
      memory: 256Mi
    defaultRequest:
      cpu: 100m
      memory: 128Mi
    max:
      cpu: "4"
      memory: 4Gi

Without LimitRange, pods without resource specs get BestEffort QoS and can consume all node resources.


Q32. What is cert-manager? How does it automate TLS in Kubernetes?

How it works:

  1. You create a Certificate resource (or annotate an Ingress)
  2. cert-manager creates a CertificateRequest and interacts with the issuer
  3. For Let's Encrypt: solves HTTP-01 or DNS-01 ACME challenge
  4. Stores the issued certificate as a Kubernetes Secret
  5. Automatically renews 30 days before expiry
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-tls
spec:
  secretName: api-tls-secret
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - api.example.com

No more manually managing TLS certs! cert-manager handles the entire lifecycle.


Q33. How do you implement canary deployments in Kubernetes?

Option 1 — Label-based with Service:

Deployment v1 (label: version=stable) — 9 replicas
Deployment v2 (label: version=canary) — 1 replica
Service selector: app=myapp (selects both)
Result: ~10% traffic to canary (by pod count ratio)

Option 2 — Argo Rollouts:

strategy:
  canary:
    steps:
    - setWeight: 5    # 5% to canary
    - pause: {duration: 10m}
    - setWeight: 20
    - pause: {duration: 30m}
    - analysis: {}    # Run AnalysisTemplate (check error rate)
    - setWeight: 100

Argo Rollouts integrates with Prometheus for automated analysis — auto-rollback if error rate > threshold.

Option 3 — Istio VirtualService:

# 95% stable, 5% canary
http:
- route:
  - destination:
      host: myapp
      subset: stable
    weight: 95
  - destination:
      host: myapp
      subset: canary
    weight: 5

Q34. What is Velero? How do you backup Kubernetes?

What it backs up:

  • All K8s API objects (deployments, services, configmaps, etc.) as JSON
  • PersistentVolume data (via snapshots or Restic/Kopia for volume backup)

How it works:

# Install Velero with AWS S3 backend
velero install --provider aws --bucket my-backup-bucket \
  --backup-location-config region=ap-south-1

# Create scheduled backup (daily at 2 AM)
velero schedule create daily-backup \
  --schedule="0 2 * * *" \
  --include-namespaces=production

# Restore
velero restore create --from-backup daily-backup-20260330

For production: Store backups in a separate AWS account or region. Test restores quarterly. Velero doesn't replace database-native backups — use RDS automated backups alongside Velero.


Q35. What are Kubernetes admission controllers? Name important ones.

  • Mutating: Can modify the request (e.g., add sidecars, set defaults)
  • Validating: Can allow or deny but not modify

Important admission controllers:

ControllerPurpose
MutatingAdmissionWebhookCustom mutation logic (Istio sidecar injection)
ValidatingAdmissionWebhookCustom validation (OPA/Gatekeeper policies)
ResourceQuotaEnforce namespace quotas
LimitRangerSet default limits
PodSecurityEnforce Pod Security Standards (replaced PodSecurityPolicy)
ServiceAccountAuto-mount default SA token
NodeRestrictionLimit what nodes can modify

OPA Gatekeeper (Open Policy Agent): Policy enforcement as ValidatingWebhook. Write policies in Rego:

# Deny containers with no resource limits
deny[msg] {
  container := input.review.object.spec.containers[_]
  not container.resources.limits
  msg := sprintf("Container '%s' must have resource limits", [container.name])
}

You've made it past 80% of candidates. The advanced section is what gets you Staff/Principal offers at ₹50 LPA+ — real production scenarios and system design questions from Google, Amazon, and top Indian unicorns.

Advanced Kubernetes Questions — The No-BS Senior Round (Q36–Q50)

Q36. Design a multi-tenant Kubernetes platform for a SaaS company.

Architecture:

One EKS cluster (or multiple for strict isolation)
├── Namespace per tenant (tenant-a, tenant-b, tenant-c)
│   ├── ResourceQuota (CPU/memory/pods per tenant)
│   ├── LimitRange (default limits)
│   ├── NetworkPolicy (deny all cross-tenant traffic)
│   └── RBAC (tenant admin can only manage their namespace)
├── Shared infrastructure (kube-system namespace)
│   ├── Ingress Controller (route by subdomain: tenant-a.saas.com)
│   ├── cert-manager (per-tenant TLS certs)
│   ├── Prometheus (scrape all tenants, per-tenant Grafana dashboards)
│   └── Vault (per-tenant secret path isolation)
└── Node pools (optional: dedicated nodes per tier via taints)

For stronger isolation: Use virtual clusters (vCluster) — lightweight K8s control plane per tenant sharing the underlying worker nodes but with isolated API server, etcd, and resources.


Q37. How does Kubernetes handle pod eviction? What triggers it?

Eviction signals:

  • memory.available < 100Mi (default soft threshold)
  • memory.available < 50Mi (default hard threshold — immediate eviction)
  • nodefs.available < 10% (disk pressure)
  • imagefs.available < 15%

Eviction order:

  1. BestEffort pods (no requests/limits) first
  2. Burstable pods exceeding their requests
  3. Guaranteed pods (only if no other option)

Node Conditions set:

  • MemoryPressure: True
  • DiskPressure: True
  • PIDPressure: True

When MemoryPressure: True, scheduler doesn't schedule new BestEffort pods on that node.

To prevent your pods from being evicted: set Guaranteed QoS (requests == limits), use PodDisruptionBudgets, and set pod priority (higher priority = evicted last).


Q38. What is IRSA (IAM Roles for Service Accounts) on EKS?

How it works:

  1. EKS cluster has an OIDC provider URL
  2. Create an IAM Role with a trust policy allowing the specific service account to assume it
  3. Annotate the K8s Service Account with the IAM Role ARN
  4. EKS webhook injects AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE into pods using that SA
  5. AWS SDK automatically detects these env vars and calls STS to assume the role
# Create IRSA-annotated service account using eksctl
eksctl create iamserviceaccount \
  --name my-service-account \
  --namespace production \
  --cluster my-eks-cluster \
  --attach-policy-arn arn:aws:iam::123456789:policy/S3ReadPolicy \
  --approve

This follows the principle of least privilege at the pod level — critical for security in multi-tenant clusters.


Q39. How does the Kubernetes scheduler handle GPU resources?

resources:
  limits:
    nvidia.com/gpu: 1  # Request 1 GPU (requests must equal limits for GPUs)

GPUs are NOT time-shared by default — each container gets exclusive access to a physical GPU. This means:

  • GPU resources cannot be overcommitted
  • Pod either gets the full GPU or waits in pending state
  • NVIDIA MIG (Multi-Instance GPU) allows partitioning A100/H100 into slices

Node labeling for GPU pools:

# Label nodes and taint them for GPU-only workloads
kubectl taint nodes gpu-node nvidia.com/gpu=present:NoSchedule
kubectl label nodes gpu-node accelerator=nvidia-tesla-a100

GPU utilization in K8s is notoriously hard to monitor — use DCGM Exporter + Prometheus.


Q40. How do you troubleshoot a node in NotReady state?

# 1. Check node conditions
kubectl describe node <node-name>
# Look for: MemoryPressure, DiskPressure, PIDPressure, NetworkNotReady

# 2. SSH into the node
# Check kubelet status
systemctl status kubelet
journalctl -xeu kubelet --no-pager | tail -50

# 3. Check disk space
df -h
# Full /var/log or /var/lib/docker can cause issues

# 4. Check memory
free -h
cat /proc/meminfo

# 5. Check container runtime
systemctl status containerd
crictl ps  # List running containers

# 6. Check network
ip route show
iptables -L -n | head -30

# 7. Certificate expiry (common cause of NotReady)
openssl x509 -in /etc/kubernetes/pki/apiserver-kubelet-client.crt -noout -dates

Common causes:

  • Kubelet crashed or stopped
  • Disk full (especially from container images or logs)
  • Network plugin (CNI) issue
  • Expired kubelet certificates
  • OOM — node itself ran out of memory
  • Cloud provider issue (EC2 instance hardware failure)

High-frequency question in SRE/platform interviews


Q41. What is the Kubernetes control loop? Explain the reconciliation pattern.

Desired State (etcd) ──→ Controller ──→ Actual State
        ↑                    │
        └────────────────────┘ (reconcile loop)
  1. Watch: Controller subscribes to events from the API server informer cache
  2. Reconcile: Compare desired state (spec) with actual state (status)
  3. Act: Take actions to close the gap (create pod, delete service, update status)
  4. Update status: Write observed state back to the resource's .status field

This is the core of all Kubernetes controllers and operators. The reconcile loop is designed to be idempotent — running it multiple times with the same input produces the same result. This makes controllers resilient to crashes and restarts.


Q42. How do you implement GitOps with Argo CD on Kubernetes?

Setup:

# Install Argo CD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Create an Application
argocd app create myapp \
  --repo https://github.com/myorg/k8s-manifests \
  --path services/myapp \
  --dest-server https://kubernetes.default.svc \
  --dest-namespace production \
  --sync-policy automated \
  --auto-prune \
  --self-heal

With --self-heal: If someone does kubectl apply manually, Argo CD detects drift and reverts to Git state within 3 minutes.

App of Apps pattern: One root Application that syncs all other Applications — declarative cluster bootstrapping.

Alternatives: Flux v2 (CNCF-graduated, more GitOps-native, no UI out of the box).


Q43. What is pod priority and preemption in Kubernetes?

# Define priority class
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority-payment
value: 1000000
globalDefault: false
preemptionPolicy: PreemptLowerPriority

---
# Use in pod
spec:
  priorityClassName: high-priority-payment

Preemption flow:

  1. High-priority pod is pending (no space on any node)
  2. Scheduler looks for nodes where evicting lower-priority pods would make room
  3. Evicts the minimum set of lower-priority pods on chosen node
  4. High-priority pod is scheduled

System priorities:

  • system-cluster-critical (2000000000): CoreDNS, metrics-server
  • system-node-critical (2000001000): kubelet, kube-proxy

Never use values >= 1000000000 for user workloads.


Q44. What are the common security best practices for Kubernetes in production?

  1. Control plane security: Private API server endpoint (no public IP), enable audit logs
  2. RBAC: Least privilege for all users and service accounts. No cluster-admin for humans.
  3. Pod Security Standards: Use Restricted policy (no privileged containers, no hostPath, non-root)
  4. Network Policies: Default-deny all traffic, explicitly allow required paths
  5. Image security: Use distroless/minimal base images, scan with Trivy/Grype, sign with cosign (Sigstore)
  6. Runtime security: Falco for anomaly detection (detects shell spawned in container, unexpected network connections)
  7. etcd encryption: Encrypt secrets at rest using AES-256 or KMS
  8. Supply chain security (SLSA): Verify image provenance, SBOM generation
  9. Secrets management: Use External Secrets Operator + Vault, not plain K8s secrets
  10. Node security: CIS Kubernetes Benchmark (kube-bench), regular node AMI patching

Q45. How does service discovery work in Kubernetes with CoreDNS?

DNS records created per service:

  • A record: <service>.<namespace>.svc.cluster.local → ClusterIP (for ClusterIP services)
  • SRV record: _<port>._<proto>.<service>.<namespace>.svc.cluster.local → port info
  • A record per pod (headless services): <pod-ip>.<service>.<namespace>.svc.cluster.local

Search path in resolv.conf:

search production.svc.cluster.local svc.cluster.local cluster.local

So within the production namespace: postgres resolves to postgres.production.svc.cluster.local.

CoreDNS caches responses (default 30 seconds). Under high load, CoreDNS can become a bottleneck — node-local DNS cache (NodeLocal DNSCache) adds a per-node cache using iptables to intercept DNS traffic before it hits CoreDNS pods, reducing latency and load.


Q46. What is the difference between soft and hard eviction thresholds?

FeatureSoft EvictionHard Eviction
Thresholdmemory.available < 200Mimemory.available < 100Mi
Grace periodConfigurable (e.g., 90 seconds)Immediate
Pod behaviorGraceful termination (SIGTERM)Forceful kill
TriggerSustained pressureCritical pressure

Soft eviction gives pods time to finish current work. Hard eviction is an emergency — kubelet sends SIGKILL immediately.

Configure in kubelet config:

evictionSoft:
  memory.available: "200Mi"
evictionSoftGracePeriod:
  memory.available: "90s"
evictionHard:
  memory.available: "100Mi"
  nodefs.available: "5%"

Q47. How do you run stateful databases on Kubernetes? What are the trade-offs?

Running databases on K8s with StatefulSet + PVC:

Pros:

  • Unified deployment model (everything in K8s)
  • Easy dev/staging setup
  • Native service discovery
  • Operators make production-grade deployments feasible (Vitess for MySQL, Zalando Postgres Operator, Percona Operators)

Cons:

  • Performance: K8s networking overhead, shared node resources
  • Storage: Cloud block storage (EBS) adds ~100µs latency vs. local NVMe
  • Operational complexity: PVC binding, pod restart ordering, backup complexity
  • No K8s storage operator matches managed services (RDS Aurora) for features

Recommendation:

  • Dev/staging: Databases on K8s ✓
  • Production stateless apps: K8s ✓
  • Production databases: Use managed services (RDS, Cloud SQL) unless you have a specific reason + dedicated operator + experienced team

Q48. What is a service mesh vs. API gateway? When do you need each?

FeatureAPI GatewayService Mesh
Traffic directionNorth-South (external → cluster)East-West (service → service)
ExamplesKong, AWS API GW, NGINXIstio, Linkerd, Cilium
AuthenticationExternal user auth (JWT, OAuth)mTLS between services
Rate limitingPer API consumer, per endpointPer service
VisibilityExternal API metricsInternal service topology
Where deployedEdge of clusterIn every pod (sidecar)

You often need both: API Gateway at the edge for external clients, service mesh for internal service-to-service security and observability.

Linkerd vs. Istio: Linkerd is lighter (Rust proxy vs. C++ Envoy), simpler to operate, faster. Istio is more feature-rich (L7 policies, traffic management). For most teams starting out, Linkerd is the pragmatic choice.


Q49. How do you implement observability (metrics, logs, traces) in Kubernetes?

Full observability stack:

Metrics:
  Pods expose /metrics (Prometheus format)
  → Prometheus scrapes via ServiceMonitor CRDs
  → Grafana dashboards (USE/RED/DORA methods)
  → Alertmanager → PagerDuty/Slack

Logs:
  Pods write to stdout/stderr
  → Fluentd/Fluent Bit DaemonSet collects
  → Elasticsearch / OpenSearch
  → Kibana or OpenSearch Dashboards
  [Alternative: Loki + Grafana — cheaper, label-indexed]

Traces:
  Apps instrument with OpenTelemetry SDK
  → OTel Collector (DaemonSet or gateway)
  → Jaeger / Tempo (Grafana)

Correlation:
  Exemplars: Link Prometheus metrics to trace IDs
  Grafana Tempo + Loki + Prometheus = unified Grafana observability

Key Kubernetes-specific metrics to monitor:

  • Pod restart count (CrashLoopBackOff signal)
  • PVC usage percentage
  • Node CPU/memory allocatable vs. requested (scheduling headroom)
  • API server error rate and latency
  • etcd write latency (>10ms indicates problems)

Q50. Explain the Kubernetes pod lifecycle — every phase and transition.

Pod phases:

  1. Pending: Pod accepted by API server, waiting for scheduling or image pull
  2. Running: At least one container is running (not necessarily healthy)
  3. Succeeded: All containers exited with code 0 and won't be restarted (Job pods)
  4. Failed: All containers terminated, at least one exited non-zero
  5. Unknown: Node communication lost (kubelet unreachable)

Container states:

  • Waiting: Being pulled, init containers running, or in CrashLoopBackOff
  • Running: Container executing
  • Terminated: Container exited (check exit code and reason)

Full lifecycle:

API server accepts pod spec
    → Scheduler assigns node
    → Kubelet pulls images (ImagePullBackOff if failed)
    → Init containers run sequentially
    → Main containers start
    → postStart lifecycle hook runs (async with container start)
    → readinessProbe starts
    → livenessProbe starts (after initialDelaySeconds)
    → [Delete signal] → preStop hook → SIGTERM → terminationGracePeriod → SIGKILL

Understanding the full lifecycle is critical for debugging and implementing zero-downtime deployments.


FAQ — Your Kubernetes Career Questions, Answered

Q: What Kubernetes certification should I get? CKA (Certified Kubernetes Administrator) is the most recognized. CKS (Certified Kubernetes Security Specialist) is valuable for SRE/security roles. CKAD (Certified Kubernetes Application Developer) for developers. CKA is the best starting point — it's a hands-on exam (real cluster, no MCQ) and highly respected.

Q: What Kubernetes version is current in 2026? Kubernetes follows a roughly quarterly release cycle. As of early 2026, stable releases are in the 1.30+ range. Managed services (EKS, GKE, AKS) typically support the 3 most recent minor versions. Always check the release calendar.

Q: Is Docker still used with Kubernetes? Docker is no longer the container runtime in Kubernetes (removed in 1.24). Kubernetes now uses containerd or CRI-O directly via the CRI interface. You still use Docker/Podman to build images and push to registries — that workflow is unchanged.

Q: What salary can I expect for Kubernetes roles in India? DevOps/SRE with K8s (3-5 yrs): ₹20–45 LPA. Platform Engineer with K8s expertise: ₹30–60 LPA. Senior SRE/Platform Lead: ₹50–1 Cr at top product companies.

Q: Helm 3 vs Helm 2 — what changed? Helm 3 removed Tiller (the server-side component with excessive cluster permissions). Charts are rendered client-side and deployed directly via the Kubernetes API. Much more secure. Helm 2 reached EOL in 2022 — never use it.

Q: When should I NOT use Kubernetes? For small teams (<10 engineers), simple monoliths, or workloads with low scaling requirements, K8s adds operational overhead that outweighs benefits. Consider AWS Fargate, Google Cloud Run, or Railway for simpler deployments. K8s shines at scale with microservices.

Q: What is the difference between EKS, GKE, and AKS? GKE (Google) is the most mature and feature-rich (Autopilot mode, GKE Dataplane V2 with eBPF). EKS (AWS) has the deepest AWS service integration. AKS (Azure) best for Microsoft-stack enterprises. All support standard Kubernetes manifests.

Q: How many nodes should my production cluster have? Minimum: 3 worker nodes spread across 3 AZs for HA. Control plane: 3 nodes (managed by cloud providers in EKS/GKE). Size based on workload — use Karpenter or Cluster Autoscaler for dynamic sizing. Avoid node counts that aren't multiples of your AZ count.


You've just absorbed the same Kubernetes knowledge that ₹50 LPA+ engineers carry into interviews. Bookmark this page, revisit before your interview, and pair it with hands-on practice on a real cluster.

Related Articles:

Advertisement Placement

Explore this topic cluster

More resources in Interview Questions

Use the category hub to browse similar questions, exam patterns, salary guides, and preparation resources related to this topic.

Company hub

Explore all Kubernetes resources

Open the Kubernetes hub to jump between placement papers, interview questions, salary guides, and other related pages in one place.

Open Kubernetes hub

Related Articles

More from PapersAdda

Share this guide: