AWS Interview Questions 2026 — Top 50 with Expert Answers
AWS certifications command a 25-30% salary premium in India, and AWS skills appear in 74% of all cloud job postings. AWS powers over 32% of the global cloud market and remains the most in-demand platform for engineers worldwide. Whether you're interviewing at Amazon itself, a fintech like Razorpay, or a startup scaling on cloud-native infrastructure, AWS knowledge is non-negotiable in 2026. This guide compiles 50 real questions from interviews at Amazon, Flipkart, Razorpay, PhonePe, Zerodha, and Swiggy — with the authoritative answers that get offers, organized from beginner to advanced.
Cloud roles are the fastest path to high-paying tech careers in India. AWS Cloud Architect roles command Rs 40-80 LPA at product companies. This guide is your roadmap to getting there.
Related: Kubernetes Interview Questions 2026 | DevOps Interview Questions 2026 | System Design Interview Questions 2026
Beginner-Level AWS Questions (Q1-Q15)
These questions are asked at every AWS interview from Wipro to Amazon. Get them right with confidence, and the interviewer immediately takes you seriously for the harder questions.
Q1. What is the difference between a Region, Availability Zone, and Edge Location in AWS?
| Concept | Definition | Example |
|---|---|---|
| Region | A geographic area with 2+ AZs | ap-south-1 (Mumbai) |
| Availability Zone (AZ) | One or more discrete data centers with redundant power/networking | ap-south-1a, ap-south-1b |
| Edge Location | CDN node used by CloudFront and Route 53 | Mumbai, Chennai |
| Local Zone | Extension of a Region closer to users | Delhi (ap-south-1-del-1) |
Regions are completely isolated from each other for fault tolerance. AZs within a region are connected by low-latency fiber links. Edge Locations are not full Regions — they only serve cached content and DNS requests.
Asked by: Amazon, Wipro, Infosys L2 interviews
Q2. What is EC2? Explain instance types and when to use each.
| Family | Optimized For | Example Use Case |
|---|---|---|
| t3/t4g | Burstable CPU (dev/test) | Development servers |
| m6i/m7g | General purpose | Application servers |
| c6i/c7g | Compute intensive | Video encoding, ML inference |
| r6i/r7g | Memory intensive | In-memory caches, SAP HANA |
| p3/p4 | GPU | Deep learning training |
| i3/i4i | High I/O NVMe | Databases, Hadoop |
| d2/d3 | Dense HDD storage | Data warehousing |
The "g" suffix (e.g., m7g) indicates AWS Graviton (ARM-based) — typically 20–40% cheaper with 10–15% better performance per dollar than x86 equivalents.
Asked by: Flipkart, Myntra, Amazon SDE-2
Q3. What is S3 and what are its storage classes?
Storage classes compared:
| Class | Use Case | Retrieval Time | Min Duration | Cost (approx) |
|---|---|---|---|---|
| S3 Standard | Frequently accessed data | Milliseconds | None | $0.023/GB |
| S3 Intelligent-Tiering | Unknown access patterns | Milliseconds | 30 days | $0.023/GB + monitoring |
| S3 Standard-IA | Infrequent access | Milliseconds | 30 days | $0.0125/GB |
| S3 One Zone-IA | Non-critical infrequent | Milliseconds | 30 days | $0.01/GB |
| S3 Glacier Instant | Archive with fast retrieval | Milliseconds | 90 days | $0.004/GB |
| S3 Glacier Flexible | Archives, 1–5 min retrieval | Minutes–hours | 90 days | $0.0036/GB |
| S3 Glacier Deep Archive | Long-term compliance | 12 hours | 180 days | $0.00099/GB |
S3 buckets are Region-specific, but bucket names must be globally unique.
Q4. What is IAM? Explain users, groups, roles, and policies.
- User: A permanent identity for a human or application (has long-term credentials — access key + secret)
- Group: Collection of users sharing the same permissions (e.g., "developers" group)
- Role: Temporary identity assumed by services, EC2 instances, Lambda, or cross-account entities (no long-term credentials — uses STS)
- Policy: JSON document defining permissions. Two main types:
- Identity-based: Attached to users/groups/roles
- Resource-based: Attached to resources (S3 bucket policy, SQS queue policy)
Best practice: Never use root account for daily operations. Follow the principle of least privilege. Prefer roles over long-term access keys for EC2 and Lambda.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::my-bucket/*"
}
]
}
Asked by: Amazon, Razorpay, PhonePe
Q5. What is a VPC? What components does it contain?
Core components:
VPC (10.0.0.0/16)
├── Public Subnet (10.0.1.0/24)
│ ├── EC2 instances with public IPs
│ └── NAT Gateway
├── Private Subnet (10.0.2.0/24)
│ ├── RDS databases
│ └── Application servers
├── Internet Gateway (IGW) — entry/exit for public traffic
├── Route Tables — define traffic routing per subnet
├── Security Groups — stateful instance-level firewall
├── Network ACLs — stateless subnet-level firewall
└── VPC Endpoints — private access to AWS services
Key difference: Security Groups are stateful (return traffic automatically allowed), NACLs are stateless (you must explicitly allow inbound AND outbound). SGs operate at instance level; NACLs at subnet level.
Q6. What is the difference between EBS, EFS, and S3?
| Feature | EBS | EFS | S3 |
|---|---|---|---|
| Type | Block storage | File storage (NFS) | Object storage |
| Attached to | Single EC2 (mostly) | Multiple EC2s simultaneously | Not attached — accessed via API/URL |
| Protocol | Block device | NFS v4 | HTTP REST |
| Performance | Up to 256,000 IOPS (io2 BE) | Scales automatically | — |
| Use case | OS disk, databases | Shared file system | Backups, static assets, data lake |
| Pricing | Per GB provisioned | Per GB stored | Per GB stored + requests |
| Multi-AZ | Replicated within AZ | Yes (Regional) | Yes (multiple AZs) |
Asked by: Infosys, TCS Digital, Wipro Elite
Q7. What is Auto Scaling and how does it work?
- Target Tracking: Maintain a metric at a target value (e.g., keep CPU at 50%)
- Step Scaling: Scale by different amounts based on breach thresholds
- Scheduled Scaling: Scale at specific times (e.g., scale up before market open at 9 AM IST)
Components: Launch Template (defines AMI, instance type, security groups) + Auto Scaling Group (defines min/max/desired capacity + VPC subnets) + Scaling Policy.
Cooldown period (default 300 seconds) prevents rapid scale in/out oscillation.
Q8. What is CloudFront and how does it differ from S3?
S3 is the origin storage. CloudFront sits in front of S3 (or EC2, ALB) to:
- Cache static content at edge
- Terminate SSL/TLS at edge
- Apply WAF rules
- Sign URLs for private content
- Compress content with gzip/brotli
Origin Access Control (OAC) replaces the old OAI — it restricts S3 bucket access only to CloudFront, so your S3 URL is never exposed publicly.
Q9. What is Route 53? What routing policies does it support?
| Routing Policy | Use Case |
|---|---|
| Simple | Single resource |
| Weighted | A/B testing, canary deployments (e.g., 90% v1, 10% v2) |
| Latency-based | Route users to the lowest-latency Region |
| Failover | Active-passive DR with health checks |
| Geolocation | Route by user's geographic location |
| Geoproximity | Route by geographic proximity (with bias) |
| Multi-Value | Return multiple IPs, basic load balancing |
| IP-based | Route based on client IP ranges (new in 2023) |
Health checks can monitor endpoints and trigger failover automatically.
Q10. What is the difference between Application Load Balancer (ALB) and Network Load Balancer (NLB)?
| Feature | ALB | NLB |
|---|---|---|
| Layer | Layer 7 (HTTP/HTTPS/WebSocket) | Layer 4 (TCP/UDP/TLS) |
| Routing | Path, header, host, query, IP | Port and protocol |
| Static IP | No (DNS only) | Yes (Elastic IP per AZ) |
| Performance | — | Ultra-low latency, millions of RPS |
| WebSockets | Yes | Yes |
| gRPC | Yes | — |
| Best for | Microservices, HTTP APIs | Gaming, IoT, financial trading |
| SSL Termination | At ALB | At NLB or pass-through |
A third type, Gateway Load Balancer (GLB), is used for inline network appliances (firewalls, IDS).
Asked by: Amazon, Swiggy, Zomato, Razorpay
Q11. What is Lambda? What are its limits?
Key limits (2026):
- Max execution timeout: 15 minutes
- Memory: 128 MB – 10 GB
- Ephemeral disk (/tmp): 512 MB – 10 GB (configurable)
- Deployment package: 50 MB (zipped direct upload), 250 MB (unzipped), 10 GB (container image)
- Concurrent executions: 1,000 per account per Region (can increase via service limit)
- Max response payload: 6 MB (synchronous), 256 KB (async)
Lambda integrates natively with API Gateway, S3, DynamoDB Streams, Kinesis, SQS, SNS, EventBridge, and 200+ more services.
Q12. What is RDS? Which engines does it support?
Supported engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Amazon Aurora (MySQL- and PostgreSQL-compatible, AWS proprietary).
RDS Multi-AZ: Synchronous replication to a standby in a different AZ for high availability. Automatic failover in 1–2 minutes. Standby cannot be used for reads (use Read Replicas for that).
Read Replicas: Asynchronous replication. Supports up to 15 replicas (Aurora). Can be promoted to standalone. Can be in different Regions (cross-region read replicas).
Q13. What is DynamoDB?
Core concepts:
- Partition Key (PK): Distributes data across partitions. Must be unique for simple primary keys.
- Sort Key (SK): Optional secondary dimension. PK + SK must be unique together.
- GSI (Global Secondary Index): Alternate access patterns with different PK/SK; can span all partitions
- LSI (Local Secondary Index): Same PK, different SK; must be created at table creation
Capacity modes:
- On-demand: Pay per request. Best for unpredictable traffic.
- Provisioned: Set RCU/WCU. Use with Auto Scaling. Better for predictable, steady workloads.
DynamoDB Streams + Lambda = real-time event-driven pipelines.
Asked by: Amazon, Flipkart, Meesho
Q14. What is CloudFormation?
Template anatomy:
AWSTemplateFormatVersion: '2010-09-09'
Description: My application stack
Parameters:
InstanceType:
Type: String
Default: t3.micro
Resources:
MyEC2:
Type: AWS::EC2::Instance
Properties:
InstanceType: !Ref InstanceType
ImageId: ami-0abcdef1234567890
Outputs:
InstanceId:
Value: !Ref MyEC2
Change Sets let you preview changes before applying. Drift Detection identifies manual changes to resources outside CloudFormation.
CloudFormation vs. Terraform: CloudFormation is AWS-only but has tighter native integration. Terraform is multi-cloud and has a larger ecosystem but requires more setup.
Q15. What is the Shared Responsibility Model?
| AWS Responsibility ("Security OF the cloud") | Customer Responsibility ("Security IN the cloud") |
|---|---|
| Physical infrastructure (data centers) | Customer data encryption |
| Hardware and network | IAM configuration and policies |
| Hypervisor | Operating system patching (for EC2) |
| Managed service software | Application-level security |
| Global network | Security groups and NACLs |
| AWS Region/AZ infrastructure | Compliance certifications for their apps |
For managed services like RDS, Lambda, S3 — AWS takes more responsibility (OS, runtime). For EC2, the customer manages everything above the hypervisor.
One of the most commonly asked conceptual questions at all levels.
Intermediate-Level AWS Questions (Q16-Q35)
This is the section that separates "I've used the console" from "I've built production systems." These questions come up in SDE-2 and Cloud Architect interviews at Amazon, Flipkart, and Razorpay.
Q16. Explain VPC Peering vs. AWS Transit Gateway vs. PrivateLink.
| Feature | VPC Peering | Transit Gateway | PrivateLink |
|---|---|---|---|
| Connectivity | 1-to-1 VPC | Hub-and-spoke for many VPCs | Service endpoint to VPC |
| Transitive routing | No | Yes | N/A |
| Cost | Free (data transfer fees apply) | Per attachment + data | Per endpoint + data |
| Cross-account | Yes | Yes | Yes |
| Cross-region | Yes | Yes | No (Regional) |
| Use case | Few VPCs | 10+ VPCs, multi-account | Access AWS or SaaS services privately |
PrivateLink exposes a service from one VPC to consumers in other VPCs without the networks being peered — traffic never leaves AWS backbone.
Asked by: Amazon senior rounds, Razorpay, PhonePe infrastructure teams
Q17. What is AWS ECS vs. EKS? When would you choose each?
| Feature | ECS | EKS |
|---|---|---|
| Orchestrator | AWS proprietary | Kubernetes |
| Learning curve | Lower | Higher |
| Portability | AWS-only | Multi-cloud (K8s standard) |
| Control plane cost | Free | $0.10/hour per cluster (~$73/month) |
| Fargate support | Yes | Yes |
| Ecosystem | AWS-native integrations | Massive CNCF ecosystem |
| Debugging | AWS Console, CloudWatch | kubectl, K8s dashboards |
Choose ECS when: team is new to containers, you want tight AWS integration, minimal operational overhead matters more than portability.
Choose EKS when: you need Kubernetes compatibility, multi-cloud strategy, using Helm charts, custom operators, or service meshes like Istio.
Asked by: Amazon, Swiggy, Razorpay, Flipkart
Q18. How does Lambda handle concurrency? What is provisioned concurrency?
- Unreserved concurrency: Shared pool, default 1,000/account/Region
- Reserved concurrency: Guarantees a set amount for one function (isolates it from others); also acts as a throttle cap
- Provisioned concurrency: Pre-warms instances to eliminate cold starts — critical for latency-sensitive APIs
Cold start breakdown:
- AWS spins up a new execution environment (container)
- Downloads deployment package
- Initializes runtime (JVM, Node, Python interpreter)
- Runs
initcode outside the handler
For Java/JVM functions, cold starts can be 2–10 seconds. Provisioned concurrency keeps N instances warm so requests are served in <1ms initialization time.
Q19. Design a highly available 3-tier web application on AWS.
Architecture (text diagram):
Internet
|
Route 53 (latency-based routing, health checks)
|
CloudFront (SSL termination, WAF, edge caching)
|
ALB (Application Load Balancer — Multi-AZ)
/ \
EC2 ASG (AZ-a) EC2 ASG (AZ-b) [Web/App Tier — Private Subnet]
\ /
ElastiCache (Redis — in-memory session, cache)
|
RDS Aurora (Multi-AZ — writer in AZ-a, reader in AZ-b)
Key HA decisions:
- Route 53 health checks trigger failover at DNS level
- ALB distributes across 2+ AZs
- ASG min=2 ensures at least one instance per AZ
- RDS Multi-AZ gives automatic failover (<120 seconds)
- ElastiCache Multi-AZ replication group
- S3 + CloudFront for static assets (decoupled from compute)
Classic architecture question at Amazon L5/L6 and Flipkart SDE-2/SDE-3
Q20. What is SQS vs. SNS vs. EventBridge? When do you use each?
| Service | Pattern | Delivery | Retention | Consumers |
|---|---|---|---|---|
| SQS | Queue (point-to-point) | Pull | 4 days – 14 days | 1 consumer |
| SNS | Pub/Sub (fan-out) | Push | No storage | Multiple |
| EventBridge | Event bus | Push | Optional archive | Rules-based routing |
Common pattern — Fan-out: SNS topic → multiple SQS queues (so each microservice gets its own copy of the event).
EventBridge is preferred for event-driven architectures — it can filter events by content, route to Lambda/SQS/Step Functions/API destinations, and integrate with 200+ SaaS sources.
Q21. What is S3 Object Lock and how does it work?
- Governance mode: Users with special IAM permissions can bypass the lock
- Compliance mode: Nobody (including root) can delete the object until the retention period expires — used for regulatory compliance (SEBI, HIPAA, SOC2)
Legal Hold: Independent of retention period — can be applied/removed by privileged users.
Required for: financial records retention, healthcare data, compliance with India's DPDP Act.
Q22. Explain AWS WAF. What rules can you configure?
Rule types:
- IP set rules: Block/allow specific IP ranges
- Rate-based rules: Throttle IPs exceeding X requests per 5 minutes (DDoS protection)
- Managed rule groups: AWS-managed rules for OWASP Top 10, known bad IPs, SQL injection, XSS
- Custom rules: Match on request components (URI, headers, query strings, body — first 8 KB)
Each Web ACL contains rules with Allow/Block/Count/CAPTCHA actions. Rules are evaluated in priority order.
Asked by: Razorpay, PhonePe, Zerodha (security-focused rounds)
Q23. What is AWS KMS? How does envelope encryption work?
Envelope encryption:
- KMS generates a Customer Master Key (CMK) — never leaves KMS
- Application requests a Data Encryption Key (DEK) from KMS
- KMS returns: plaintext DEK + encrypted DEK
- Application encrypts data with plaintext DEK in memory
- Plaintext DEK is discarded; encrypted DEK is stored alongside encrypted data
- To decrypt: send encrypted DEK to KMS → get back plaintext DEK → decrypt data
This pattern means KMS is only involved during key wrapping/unwrapping, not for bulk data encryption, keeping latency and cost low.
Q24. How does DynamoDB handle hot partitions? How do you fix them?
Solutions:
- Write sharding: Append a random suffix (0–9) to the PK → spread 1 partition into 10 → scatter reads also use scatter-gather
- Caching layer: ElastiCache DAX (DynamoDB Accelerator) — in-memory caching at microsecond latency, reduces read load on hot items
- Better PK design: Choose high-cardinality attributes (UUID, user_id, order_id)
- Adaptive capacity (automatic): DynamoDB shifts capacity to hot partitions automatically within limits
Deep-dive question at Amazon, Flipkart data platform teams
Q25. What is AWS CloudTrail and how does it differ from CloudWatch?
| Feature | CloudTrail | CloudWatch |
|---|---|---|
| Purpose | API audit log | Metrics, logs, alarms, dashboards |
| What it tracks | Who did what (API calls) | How resources are performing |
| Data type | Events (JSON) | Metrics, log streams |
| Retention | 90 days (free) or S3 | Configurable (indefinite) |
| Use case | Security audit, compliance | Ops monitoring, alerting |
CloudTrail records every API call: who made it (user/role/service), from which IP, at what time, what resource was targeted, and whether it succeeded. Management events are enabled by default; data events (S3 object-level, Lambda invocations) cost extra.
Q26. What is Elastic Beanstalk? When would you NOT use it?
Do NOT use Elastic Beanstalk when:
- You need fine-grained control over infrastructure configuration
- You're running microservices that need container orchestration
- You need to customize the underlying OS or runtime beyond what EB offers
- Your team already uses Terraform/CDK for IaC (EB's environment configs don't integrate cleanly)
- Cost optimization is critical (you'd over-provision with EB's opinionated setup)
Good for: monolithic applications, teams new to AWS, proof-of-concepts.
Q27. What is AWS Step Functions? Give a real use case.
Real use case — Order processing pipeline:
[Start]
→ ValidateOrder (Lambda)
→ CheckInventory (Lambda)
→ [Choice] In stock?
→ YES: ReserveInventory → ProcessPayment → SendConfirmation → [End]
→ NO: NotifyOutOfStock → RefundPayment → [End]
→ [Catch] PaymentFailed → RollbackInventory → SendFailureNotification → [End]
Step Functions handles retries, timeouts, parallel execution, and error handling automatically. Standard Workflows are for long-running (1 year max), exactly-once execution. Express Workflows are for high-throughput, short-duration (5 min max), at-least-once execution.
Asked by: Amazon, Flipkart, Myntra backend rounds
Q28. What is the difference between NAT Gateway and NAT Instance?
| Feature | NAT Gateway | NAT Instance |
|---|---|---|
| Managed by | AWS | Customer |
| Availability | Highly available (within AZ) | Single EC2 — SPOF unless you manage HA |
| Bandwidth | Up to 100 Gbps (auto-scales) | Limited by instance size |
| Security groups | Cannot attach | Can attach |
| Cost | $0.045/hour + $0.045/GB | EC2 pricing + management overhead |
| Maintenance | None | Patching, monitoring required |
Rule: Always use NAT Gateway in production. NAT Instances are obsolete except for very cost-sensitive dev environments.
Q29. How would you optimize AWS costs for a startup running 24/7 workloads?
- Reserved Instances (RI): 1-year or 3-year commitment for EC2, RDS, ElastiCache → up to 72% savings vs. On-Demand
- Savings Plans: More flexible than RIs — compute savings plans cover any EC2, Lambda, Fargate; 66% max savings
- Spot Instances: 70–90% cheaper than On-Demand for interruption-tolerant workloads (batch jobs, ML training, CI runners)
- Graviton instances: Switch t3→t4g, m5→m7g → 20–40% cost reduction, better perf/dollar
- S3 Intelligent-Tiering: Auto-moves infrequently accessed objects to cheaper tiers
- RDS Aurora Serverless v2: Scales to 0 ACUs in dev/staging (no cost when idle)
- Lambda instead of always-on EC2: For bursty or low-frequency workloads
- AWS Cost Explorer + Budgets: Set billing alarms, identify waste
- Right-sizing: Use Compute Optimizer recommendations to downsize over-provisioned instances
- Delete zombie resources: Unused EIPs ($3.65/month each), idle NAT Gateways, forgotten snapshots
Heavily asked at startup interviews (Razorpay, Zerodha, CRED)
Q30. What is AWS Config and how does it enforce compliance?
How it works:
- Config Recorder captures resource configurations and changes
- Config Rules evaluate configurations (AWS Managed Rules or custom Lambda rules)
- Non-compliant resources are flagged
- Remediation Actions (manual or auto via SSM Automation) fix violations
Common rules:
ec2-instance-no-public-ip— alert on EC2s with public IPs in private subnetss3-bucket-public-read-prohibited— detect accidentally public S3 bucketsiam-root-access-key-check— ensure root has no access keysrds-multi-az-support— enforce Multi-AZ for production RDS
Q31. What is the difference between STS AssumeRole and IAM instance profiles?
- Instance Profile: An IAM role attached to an EC2 instance. The EC2 metadata service (
169.254.169.254) vends temporary credentials automatically via the IMDSv2 endpoint. Applications running on EC2 call the metadata endpoint and get credentials without any configuration. - STS AssumeRole: An explicit API call (
sts:AssumeRole) to obtain temporary credentials for a role. Used for cross-account access, federated identity (SSO), Lambda assuming another role, or EKS pods via IRSA.
IMDSv2 (Instance Metadata Service v2) is session-oriented and requires a PUT request first — mitigates SSRF attacks that can steal EC2 credentials (a real attack vector that got several companies breached).
Q32. What is Aurora Global Database?
Use cases:
- Global applications needing low-latency reads worldwide
- Disaster recovery with RPO of ~1 second and RTO of <1 minute
Failover: In a disaster, you can promote a secondary Region to primary. Global Write Forwarding allows secondary Regions to write — Aurora routes the write to the primary automatically (slight latency increase).
Cost: ~20% more than standard Aurora due to cross-region replication costs.
Q33. How does API Gateway handle throttling?
- Account-level: 10,000 RPS (requests per second) burst limit of 5,000 per Region (adjustable)
- Stage-level: Set default throttling per stage
- Method-level: Override per route (e.g.,
/paymentstricter than/health) - Usage Plans + API Keys: Throttle per API consumer (for public APIs with paying customers)
When throttled, clients receive HTTP 429 Too Many Requests. Implement exponential backoff + jitter in clients. Use SQS as a buffer in front of Lambda for bursty ingestion instead of direct API Gateway → Lambda if you can tolerate async processing.
Q34. What is AWS CDK vs. CloudFormation vs. Terraform?
| Feature | CloudFormation | CDK | Terraform |
|---|---|---|---|
| Language | YAML/JSON | TypeScript, Python, Java, C#, Go | HCL |
| Multi-cloud | No | No | Yes |
| Abstraction | Low | High (L3 constructs) | Medium |
| State management | Managed by AWS | Deploys via CloudFormation | Local/remote tfstate |
| Import existing resources | Limited | Limited | Yes (terraform import) |
| Module ecosystem | None | Construct Hub | Terraform Registry |
| Community | Medium | Growing | Very large |
CDK synthesizes into CloudFormation templates — so ultimately it's CloudFormation under the hood. CDK's high-level constructs (L3) encapsulate best practices (e.g., ApplicationLoadBalancedFargateService = ALB + ECS Fargate with sane defaults in one construct).
Asked at senior/architect level interviews
Q35. Explain SQS visibility timeout and dead-letter queues.
Visibility Timeout: When a consumer reads a message from SQS, the message becomes invisible to other consumers for the visibility timeout duration (default 30 seconds, max 12 hours). If the consumer successfully processes and deletes the message within this window, it's gone. If it crashes or takes too long, the message reappears for another consumer to pick up.
Dead-Letter Queue (DLQ): After a message is received N times (maxReceiveCount) without being deleted, SQS moves it to the DLQ. The DLQ is a regular SQS queue used for:
- Debugging: inspect why messages failed
- Alerting: CloudWatch alarm on DLQ depth > 0
- Replaying: fix the consumer, then move messages back to the main queue
For FIFO queues, DLQs must also be FIFO.
Advanced-Level AWS Questions (Q36-Q50)
Don't skip the Advanced section — this is where interviewers separate Rs 20 LPA from Rs 50+ LPA candidates. These questions are asked at Amazon SDE-3, Flipkart L3, and senior cloud architect roles.
Q36. Design a serverless data pipeline ingesting 1 million events/minute on AWS.
Architecture:
Mobile/Web Apps
|
Kinesis Data Streams (100 shards × 1MB/s = 100MB/s capacity)
|
Kinesis Data Firehose (buffers, transforms, delivers)
/ \
S3 (raw) Lambda (real-time processing, enrich/filter)
| |
AWS Glue DynamoDB (hot path — last 5 min aggregations)
(batch ETL)
|
S3 (parquet, Hive-partitioned: year/month/day/hour)
|
Amazon Athena (ad-hoc SQL on S3)
|
QuickSight (dashboards)
Key choices: Kinesis over SQS for ordered, replay-capable, high-throughput streaming. Firehose auto-scales and handles delivery retries. Parquet format gives 3–5x query speedup in Athena. Glue Data Catalog as metastore for Athena schema discovery.
Q37. How would you implement blue/green deployments on AWS?
Option 1 — Route 53 weighted routing:
- Blue (current production): 100% weight
- Deploy Green (new version) to a separate stack
- Shift 10% → 50% → 100% traffic via Route 53 weights
- Monitor metrics; instant rollback by setting Green weight to 0
Option 2 — ALB listener rules:
- Two target groups: Blue (v1) and Green (v2)
- Shift traffic by modifying target group weights on the listener
- AWS CodeDeploy automates this for ECS and Lambda
Option 3 — CodeDeploy for Lambda:
Linear10PercentEvery1Minute: shift 10% of traffic to new Lambda version every minute- Pre/PostTraffic hooks run validation Lambda functions before/after shift
- Automatic rollback if CloudWatch alarms fire during deployment
Architecture question at Amazon L6, Flipkart Principal Engineer
Q38. How does EKS handle IAM authentication and authorization?
-
Authentication: AWS IAM via
aws-iam-authenticator. Thekubectlcommand generates a pre-signed STS URL token. The EKS control plane validates the token against IAM. -
Authorization: Kubernetes RBAC. IAM identities are mapped to Kubernetes users/groups via the
aws-authConfigMap inkube-system:
mapRoles:
- rolearn: arn:aws:iam::123456789:role/developer-role
username: developer
groups:
- system:masters # cluster-admin (or custom RBAC groups)
IRSA (IAM Roles for Service Accounts): Associate a Kubernetes Service Account with an IAM Role using OIDC federation. Pods get AWS credentials scoped to their service account → no more node-level IAM roles sharing credentials across all pods.
Q39. What is AWS Outposts? When would you deploy it?
When to use:
- Data residency requirements preventing cloud migration (banking, government, healthcare in certain jurisdictions)
- Ultra-low latency local processing with cloud connectivity
- Gradual hybrid migration strategy
- Manufacturing/industrial edge computing where internet connectivity is unreliable
Available in Outposts rack (full rack delivered by AWS), Outposts servers (1U/2U for smaller locations), and Local Zones (AWS-operated facility close to metro areas — different from Outposts).
Q40. Explain AWS Shield Standard vs. Advanced and how DDoS protection works.
Shield Standard (free, automatic):
- Protects all AWS customers against Layer 3/4 attacks (SYN floods, UDP reflection)
- Automatic detection and mitigation at the network edge
Shield Advanced ($3,000/month per organization):
- Layer 7 protection (with WAF integration)
- 24/7 DDoS Response Team (DRT) access
- Cost protection (AWS credits DDoS-related scaling charges)
- Advanced attack diagnostics in real-time
- Protects: EC2, ELB, CloudFront, Global Accelerator, Route 53
DDoS mitigation architecture:
Attacker
|
CloudFront (absorbs HTTP floods at edge — 450+ PoPs)
|
Shield Advanced (Layer 3/4 scrubbing)
|
WAF (rate-based rules, IP reputation lists)
|
ALB (health checks drop bad traffic)
|
Your application (sees only clean traffic)
Asked at Razorpay, Zerodha security rounds
Q41. How do you implement cross-account access in AWS?
Pattern 1 — Cross-account IAM Role:
- In Account B (target), create a role with a trust policy allowing Account A to assume it
- In Account A, attach an IAM policy to users/roles allowing
sts:AssumeRoleon Account B's role - Application in Account A calls
sts:AssumeRole→ gets temporary credentials for Account B resources
Pattern 2 — Resource-based policies (S3, SQS, KMS, Lambda): Directly grant Account A principal access in the resource policy without assuming a role.
Pattern 3 — AWS Organizations + Service Control Policies (SCPs): SCPs are permission guardrails applied at OU/account level — they restrict what IAM policies CAN grant, even if the IAM policy allows it. Used for organization-wide compliance (e.g., "no one can disable CloudTrail", "all resources must be tagged").
Q42. What is the difference between CloudWatch Metrics, Logs, and X-Ray?
| Tool | Purpose | Data Type |
|---|---|---|
| CloudWatch Metrics | Numeric time-series data (CPU, latency, error rate) | Numbers with dimensions |
| CloudWatch Logs | Log lines from applications and AWS services | Text streams |
| CloudWatch Log Insights | Serverless SQL-like queries over log data | Query language |
| AWS X-Ray | Distributed tracing across services | Trace segments/subsegments |
X-Ray provides an end-to-end trace view: API Gateway → Lambda → DynamoDB → external HTTP calls. It generates a service map showing latency contribution per service and error rates. Essential for debugging distributed latency in microservices.
Q43. How does S3 Transfer Acceleration work?
When it helps: Uploading from geographically distant clients (e.g., a user in India uploading to an S3 bucket in us-east-1 for compliance reasons). AWS provides a speed comparison tool at s3-accelerate-speedtest.s3-accelerate.amazonaws.com.
When it doesn't help (or hurts): Same-region uploads — the public internet path is comparable to the AWS backbone for short distances, and you'd pay the acceleration surcharge unnecessarily.
Q44. What is AWS Global Accelerator? How is it different from CloudFront?
| Feature | CloudFront | Global Accelerator |
|---|---|---|
| Use case | HTTP/HTTPS content delivery and caching | TCP/UDP traffic acceleration, non-HTTP |
| Caching | Yes | No |
| Static IPs | No | Yes (2 anycast IPs) |
| Protocol | HTTP/HTTPS/WebSocket | Any TCP/UDP |
| Health routing | No (edge-to-origin) | Yes (routes around failures) |
| Best for | Web content, APIs with caching | Gaming, IoT, multi-region ALB failover |
Global Accelerator's two static anycast IPs are whitelisted in enterprise firewalls — critical for B2B SaaS. It routes traffic to healthy endpoints across regions automatically.
Q45. How do you encrypt data at rest in S3?
- SSE-S3 (AES-256): AWS manages keys entirely. Zero configuration. Free. Default since January 2023 for all new objects.
- SSE-KMS: Uses KMS CMK. You control key policy, audit usage via CloudTrail, enable key rotation. Adds KMS API call latency (~1ms) and cost ($0.03/10,000 requests).
- SSE-KMS with DSSE: Dual-layer encryption — two independent KMS calls. Meets CNSSI requirements for top-secret data.
- SSE-C (Customer-Provided Keys): You provide the key per request. AWS encrypts, then discards the key. You manage key storage.
For client-side encryption: use the AWS Encryption SDK. Data is encrypted before leaving the client. AWS never sees plaintext.
Q46. What is Amazon Bedrock? How does it integrate with existing AWS services?
Integration patterns:
- Knowledge Bases for Bedrock: RAG pipeline — upload documents to S3 → Bedrock ingests, chunks, embeds into a vector store (OpenSearch Serverless or Pinecone) → query via RetrieveAndGenerate API
- Bedrock Agents: Multi-step reasoning agents that call Lambda functions as tools, query databases, and complete tasks autonomously
- Guardrails: Content filtering, PII redaction, topic blocking — applied to inputs and outputs
- Model Evaluation: Compare models on custom datasets before choosing
Rapidly becoming a standard interview topic at AI-forward companies (2026)
Q47. Explain AWS Well-Architected Framework pillars with examples.
| Pillar | Key Principle | Example |
|---|---|---|
| Operational Excellence | Run and monitor systems, continuously improve | Use IaC (CloudFormation/CDK), implement runbooks, post-mortems |
| Security | Protect information and systems | Enable MFA, use KMS, rotate credentials, enable GuardDuty |
| Reliability | Recover from failures, meet demand | Multi-AZ RDS, circuit breakers, chaos engineering with FIS |
| Performance Efficiency | Use resources efficiently as demand changes | Choose right instance family, use CDN, profile before optimizing |
| Cost Optimization | Avoid unnecessary costs | Reserved Instances, Spot for batch, right-sizing |
| Sustainability | Minimize environmental impact | Graviton (better perf/watt), serverless, archive to Glacier |
AWS Fault Injection Service (FIS) is the chaos engineering tool — injects CPU stress, AZ failures, latency on EC2/ECS/EKS to validate resilience.
Q48. How does Amazon Kinesis Data Streams handle shard splitting and merging?
- Shard Splitting: Split one shard into two when you need more throughput. The old shard becomes read-only (existing data still readable until retention expires); new shards receive new writes.
- Shard Merging: Merge two adjacent shards (by hash key range) to reduce cost when throughput drops.
Re-sharding considerations:
- Enhanced fan-out consumers (dedicated 2 MB/s per consumer per shard) are not affected
- Partition key hashing must be understood — all records with the same partition key go to the same shard (ordering guarantee within a key)
- Use
DescribeStreamSummaryto check current shard count before splitting
Q49. What is Service Control Policy (SCP) in AWS Organizations? Give a practical example.
Practical example — Prevent disabling security services:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyDisableGuardDuty",
"Effect": "Deny",
"Action": [
"guardduty:DeleteDetector",
"guardduty:DisassociateFromMasterAccount",
"cloudtrail:StopLogging",
"cloudtrail:DeleteTrail",
"config:DeleteConfigurationRecorder"
],
"Resource": "*"
}
]
}
Applied to the root OU, this SCP prevents any account in the organization — even the root user of a member account — from disabling security monitoring. Essential for enterprise governance.
Asked at senior/lead level at fintech companies
Q50. How do you troubleshoot a Lambda function timing out at 15 minutes?
Diagnosis steps:
- Check CloudWatch Logs for the function — identify which part is slow (add timestamps/X-Ray traces)
- Check Lambda Insights (enhanced monitoring) for memory pressure, CPU throttling, init duration
- Check downstream service latency: DynamoDB, RDS, external HTTP calls, S3
Solutions:
- Break into smaller functions: Use Step Functions to orchestrate a workflow instead of one monolithic Lambda
- Async pattern: Trigger Lambda from SQS/SNS, return 202 Accepted immediately, poll for results
- Optimize the bottleneck: Add connection pooling for RDS (RDS Proxy), add ElastiCache, use batch operations instead of loops
- Switch to Fargate/ECS: For truly long-running tasks (video processing, ML inference), Fargate has no 15-minute limit
- Increase Lambda memory: CPU scales linearly with memory. Going from 512MB to 2GB often cuts runtime by 60-75% for CPU-bound tasks
FAQ Section — Your AWS Career Questions Answered
Q: Which AWS certification should I get first? Start with AWS Certified Solutions Architect - Associate. It's the most recognized cloud cert globally and has the highest ROI for career growth — many engineers report Rs 3-5 LPA salary bumps just from this one certification. It covers all core services and is required/preferred at most companies. After that, pursue the Solutions Architect - Professional or DevOps Engineer - Professional depending on your role.
Q: What is the difference between AWS and Azure for Indian companies? AWS dominates India with data centers in Mumbai (ap-south-1) and Hyderabad (ap-south-2). Most Indian unicorns (Flipkart, Razorpay, Zerodha, CRED, Swiggy) run on AWS. Azure is stronger in enterprises using Microsoft stack. GCP is growing in AI/ML workloads.
Q: Is Terraform or CloudFormation better for AWS? For AWS-only teams: CloudFormation or CDK is easier (native integration, no state file management). For multi-cloud or teams with strong Terraform expertise: Terraform. Most companies use Terraform in practice because of the larger ecosystem and better multi-account/multi-region patterns (Terragrunt, Atlantis).
Q: What salary can I expect for AWS roles in India in 2026? Here are the real numbers from verified offers: AWS Cloud Engineer (3-5 years): Rs 18-35 LPA. AWS Architect (7+ years): Rs 40-80 LPA. AWS at FAANG (Amazon): Rs 60 LPA-1.5 Cr+ including RSUs. DevOps/SRE with strong AWS: Rs 20-50 LPA at product companies. The cloud skills gap in India is massive — demand far exceeds supply.
Q: What is the difference between Reserved Instances and Savings Plans? Reserved Instances are tied to a specific instance type, region, and OS. Savings Plans are more flexible — Compute Savings Plans cover any EC2 instance, Fargate, and Lambda regardless of type, size, or region within the commitment amount. Savings Plans are generally recommended now over RIs.
Q: How do you handle secrets in AWS Lambda? Never hardcode secrets. Use AWS Secrets Manager (auto-rotation, versioning, fine-grained IAM policies) or SSM Parameter Store (SecureString type, free tier available). Access secrets at function startup and cache in memory (not in environment variables for highly sensitive secrets, as they're visible in the console).
Q: What is the difference between ECS on EC2 and ECS on Fargate? With EC2 launch type, you manage the underlying EC2 instances (patching, scaling, capacity planning). With Fargate, you specify CPU and memory per task and AWS manages the underlying infrastructure. Fargate is ~20-30% more expensive but eliminates operational overhead. Use Fargate for most workloads unless you have specific kernel requirements or need GPU access.
Q: How does AWS handle data sovereignty for Indian customers? AWS's Mumbai (ap-south-1) and Hyderabad (ap-south-2) Regions ensure data stays in India. RBI, SEBI, and IRDAI regulated entities can use these regions to comply with data localization requirements. AWS has a shared compliance responsibility — it holds certifications like ISO 27001, SOC 2, and PCI DSS, and customers inherit these for their workloads.
Summary: What Companies Actually Ask
| Company | Focus Areas |
|---|---|
| Amazon (AWS SDE/SRE) | DynamoDB deep dives, distributed systems, Lambda internals, cost optimization, Leadership Principles |
| Flipkart | Multi-region architecture, Kinesis streaming, EKS, cost optimization at scale |
| Razorpay | VPC security, WAF, KMS, compliance (PCI DSS), API Gateway, Lambda |
| PhonePe | High-availability patterns, RDS Aurora, caching strategies, incident response |
| Zerodha | Security (IAM, SCP, GuardDuty), CloudTrail, cost optimization, minimal infrastructure philosophy |
| Swiggy/Zomato | Auto-scaling for traffic spikes, ECS/EKS, ElastiCache, SQS patterns |
Keep building your cloud & infrastructure interview toolkit:
- DevOps Interview Questions 2026 — CI/CD, Terraform, monitoring
- Kubernetes Interview Questions 2026 — Container orchestration mastery
- Docker Interview Questions 2026 — Container fundamentals
- System Design Interview Questions 2026 — Architect systems on AWS
- Microservices Interview Questions 2026 — Distributed application patterns
- Data Engineering Interview Questions 2026 — Build data pipelines on AWS
Explore this topic cluster
More resources in Interview Questions
Use the category hub to browse similar questions, exam patterns, salary guides, and preparation resources related to this topic.
Related Articles
DevOps Interview Questions 2026 — Top 50 with Expert Answers
Elite DevOps teams deploy to production multiple times per day with a change failure rate under 5%. That's the bar companies...
Docker Interview Questions 2026 — Top 40 with Expert Answers
Docker engineers at product companies command ₹15-35 LPA, and senior container/DevOps specialists at Flipkart, Razorpay, and...
Kubernetes Interview Questions 2026 — Top 50 with Expert Answers
Kubernetes engineers command ₹25-60 LPA in India. Platform engineers with deep K8s expertise at Flipkart, Swiggy, and...
Microservices Interview Questions 2026 — Top 40 with Expert Answers
Senior backend engineers with microservices expertise earn ₹30-90 LPA at product companies. Staff/Principal architects at...
AI/ML Interview Questions 2026 — Top 50 Questions with Answers
AI/ML engineer is the highest-paid engineering role in 2026, with median compensation exceeding $200K at top companies. But...