AWS Solutions Architect Interview Questions 2026: SAA-C03 & Design Patterns

What changed in 2026 drives
Mass-recruiter offer letters are flatter for 2026 batch - the 4-5 LPA ASE band has barely budged in three years while inflation eats real wages. Premium tracks (Digital, Pro, Elite, Specialist) are still where the differential lives, and they are entirely test-driven. If you are aiming higher than the default offer, the coding round is not optional pageantry - it is the entire interview.
What I'd actually study for this
- 01Two solid coding-round answers (1 medium-hard DSA each, with edge-case discussion) > five half-baked ones
- 02One real project you can defend end-to-end - file paths, design decisions, and what you would change
- 03One DBMS schema you actually built (not a textbook ER diagram), with at least 3 join-heavy queries written from memory
- 04Three behavioural STAR stories: failure recovered, conflict handled, ownership taken
Where most candidates trip up
The single biggest mistake is treating company-specific guides as primary prep and DSA as secondary. It is the opposite. Mass recruiters use the test as a filter, but premium tracks at every IT services company use coding to allocate offer band. Spend 70% of prep time on DSA + system fundamentals, 20% on company-specific patterns, 10% on HR rehearsal. Reverse that ratio and you collect the default offer.
Editorial commentary by Aditya Sharma · written for PapersAdda · not generated, not aggregated.
Candidates report that AWS Solutions Architect interviews in 2026 combine service knowledge with real-world design questions: how would you architect a multi-region, fault-tolerant application? Confirm current exam guide and service features on the official AWS documentation and exam pages.
AWS Solutions Architect is one of the most in-demand cloud certifications. Interviews test both the SAA-C03 exam syllabus and practical architecture judgment: choosing between services, designing for failure, and optimizing cost.
Core Compute
Q1. What is the difference between EC2 instance types, and how do you choose the right one?
EC2 instances are grouped into families by workload type:
| Family | Purpose | Examples |
|---|---|---|
| General Purpose | Balanced CPU/memory | t3, m6i, m7g |
| Compute Optimized | High CPU-to-memory | c6i, c7g |
| Memory Optimized | High memory-to-CPU | r6i, x2idn, u-* |
| Storage Optimized | High disk I/O (NVMe SSD) | i3en, d3en |
| Accelerated Computing | GPU / FPGA | p4d, g5, inf2, trn1 |
| HPC Optimized | Low-latency networking (EFA) | hpc6a |
Selection process:
- Identify bottleneck: CPU-bound (compute optimized), memory-bound (memory optimized), I/O-bound (storage optimized).
- Graviton (ARM) instances (m7g, c7g, r7g): cost less per performance unit, good for containerized workloads.
- T-family (burstable): use only for variable workloads with CPU spikes -- sustained high CPU depletes credits.
- Spot instances: up to 90% cost savings for fault-tolerant, stateless workloads. Use with Auto Scaling and Spot interruption handling.
Pricing models:
| Model | Discount | Commitment | Use case |
|---|---|---|---|
| On-Demand | 0% | None | Dev/test, unpredictable |
| Reserved (1yr) | ~40% | 1 year | Steady-state production |
| Reserved (3yr) | ~60% | 3 years | Committed long-term |
| Savings Plans | ~60% | 1-3 years | Flexible (EC2+Fargate+Lambda) |
| Spot | ~70-90% | None | Stateless, fault-tolerant |
Q2. What is Auto Scaling, and what are the different scaling policies?
Auto Scaling automatically adjusts EC2 capacity based on demand.
Components:
- Auto Scaling Group (ASG): Defines min/max/desired capacity, launch template, AZ placement.
- Launch Template: AMI, instance type, security groups, key pair, user data.
- Scaling policies: Rules that trigger capacity changes.
Scaling policy types:
Target Tracking (recommended)
# Maintain average CPU utilization at 60%
aws autoscaling put-scaling-policy \
--policy-name cpu-target-tracking \
--auto-scaling-group-name my-asg \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"TargetValue": 60.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
}
}'
Step Scaling
# Scale out by 2 if CPU > 70%, by 4 if CPU > 85%
# Scale in by 1 if CPU < 40%
Scheduled Scaling
# Scale to 20 instances every weekday at 8 AM
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name my-asg \
--scheduled-action-name scale-up-morning \
--recurrence "0 8 * * 1-5" \
--desired-capacity 20
Predictive Scaling Uses ML to forecast traffic and scale proactively based on historical patterns.
Key settings:
- Cooldown period: Wait time after scale event before next scale (default 300 seconds). Prevents thrashing.
- Warm-up period: Time for new instances to start serving traffic (excluded from scale-in until warm).
- Termination policies: OldestInstance, NewestInstance, ClosestToNextInstanceHour (cost optimization).
Q3. Explain ELB types and when to use each.
AWS has four load balancer types:
| Type | Layer | Protocol | Use case |
|---|---|---|---|
| Application Load Balancer (ALB) | 7 (HTTP/S) | HTTP, HTTPS, gRPC | Web apps, microservices, content-based routing |
| Network Load Balancer (NLB) | 4 (TCP/UDP) | TCP, UDP, TLS | Ultra-low latency, static IP, non-HTTP protocols |
| Gateway Load Balancer (GWLB) | 3 | IP packets | Inline network appliances (firewalls, IDS) |
| Classic Load Balancer (CLB) | 4/7 | HTTP/S, TCP | Legacy only (do not use for new deployments) |
ALB routing rules (interview favorite):
# Path-based routing
/api/* -> Target Group: API servers
/static/* -> Target Group: CDN origin / S3
# Host-based routing
api.example.com -> TG: API
www.example.com -> TG: Frontend
# Header-based routing
Header "X-Mobile: true" -> TG: Mobile-optimized backend
# Weighted routing (blue/green, canary)
TG-Blue: weight 90, TG-Green: weight 10
NLB characteristics:
- Preserves client source IP (ALB replaces with LB IP unless Proxy Protocol enabled).
- Static Elastic IP addresses -- useful for IP whitelisting at client firewalls.
- Handles millions of requests per second with microsecond latency.
- Supports PrivateLink for cross-VPC service endpoints.
Storage
Q4. Compare S3 storage classes and their use cases.
| Storage Class | Retrieval | Min Duration | Use case |
|---|---|---|---|
| Standard | Immediate | None | Frequently accessed data |
| Intelligent-Tiering | Immediate (frequent/infrequent tiers) | None | Unknown or changing access patterns |
| Standard-IA | Immediate | 30 days | Infrequent access (backup, disaster recovery) |
| One Zone-IA | Immediate | 30 days | Infrequent, non-critical (re-creatable data) |
| Glacier Instant Retrieval | Milliseconds | 90 days | Archival, quarterly access |
| Glacier Flexible Retrieval | Minutes to hours | 90 days | Archival, annual access |
| Glacier Deep Archive | 12-48 hours | 180 days | Long-term compliance, rarely accessed |
S3 Lifecycle policies:
{
"Rules": [{
"ID": "archive-old-logs",
"Status": "Enabled",
"Filter": {"Prefix": "logs/"},
"Transitions": [
{"Days": 30, "StorageClass": "STANDARD_IA"},
{"Days": 90, "StorageClass": "GLACIER"},
{"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
],
"Expiration": {"Days": 2555}
}]
}
S3 performance:
- Standard: 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix.
- Multipart upload: recommended for objects > 100 MB, required for > 5 GB.
- Transfer Acceleration: CloudFront edge locations for faster global uploads.
- S3 Select: SQL queries on object content (CSV, JSON, Parquet) -- reduce data transfer.
Q5. What is the difference between EBS, EFS, and instance store?
| Feature | EBS | EFS | Instance Store |
|---|---|---|---|
| Type | Block storage | NFS file system | Ephemeral block |
| Persistence | Yes (independent of instance) | Yes (regional) | No (lost on stop/terminate) |
| Attach to | Single AZ, single EC2 (usually) | Multiple EC2, multiple AZs | Specific instance type |
| Throughput | Up to 16 GB/s (io2 BE) | Burst or provisioned | Very high (NVMe SSD) |
| Use case | Boot volumes, databases | Shared content, CMS, ECS | High I/O temp storage, caches |
| Cost | Per GB + IOPS | Per GB used | Included in instance price |
EBS volume types:
| Type | IOPS | Use case |
|---|---|---|
| gp3 (default) | Up to 16,000 | General purpose (boot, dev) |
| io2 Block Express | Up to 256,000 | Latency-sensitive DBs (SAP HANA, Oracle) |
| st1 (throughput) | 500 IOPS | Big data, log processing (sequential read) |
| sc1 (cold) | 250 IOPS | Infrequent access cold storage |
EFS performance modes:
- General Purpose: low-latency operations (web servers, CMS).
- Max I/O: high aggregate throughput (big data, media processing) -- higher latency.
Databases
Q6. When do you use RDS vs DynamoDB vs Aurora vs ElastiCache?
| Service | Type | Use case | When NOT to use |
|---|---|---|---|
| RDS (MySQL/PostgreSQL) | Relational | Structured data, ACID transactions, reporting | High-scale OLTP (>100K TPS) |
| Aurora | Managed relational (MySQL/PG compat) | High-scale relational, global apps | Simple/small workloads (cost) |
| DynamoDB | NoSQL key-value/document | Single-digit ms at any scale, serverless | Complex queries, JOINs, ACID multi-table |
| ElastiCache (Redis) | In-memory cache | Session store, leaderboards, pub/sub, hot data | Durable primary storage |
| ElastiCache (Memcached) | In-memory cache | Simple key-value cache, horizontal scaling | Complex data structures, persistence |
| Redshift | Columnar OLAP | Data warehouse, analytics (TB-PB) | OLTP, row-level updates |
| DocumentDB | MongoDB-compatible | JSON documents, content management | High-volume time series |
| Neptune | Graph | Social graphs, fraud detection | Non-graph data |
Aurora advantages over RDS:
- Up to 15 read replicas (vs 5 for RDS MySQL/PG).
- Automatic storage growth in 10 GB increments (no pre-provisioning).
- Aurora Global Database: sub-second cross-region replication (RPO seconds, RTO < 1 minute).
- Aurora Serverless v2: auto-scales from 0.5 to 128 ACUs in seconds.
Q7. How does DynamoDB achieve single-digit millisecond performance at scale?
DynamoDB architecture for scale:
1. Partition key distribution Data is distributed across partitions by hashing the partition key. Each partition handles up to 3,000 RCUs and 1,000 WCUs.
# Hot partition problem: all writes to same partition key
# BAD key: status (only "active"/"inactive" -- two partitions take all traffic)
# GOOD key: user_id (high cardinality, uniform distribution)
# For write-heavy hot keys: add random suffix
user_id = "user123"
shard_key = f"{user_id}#{random.randint(0, 9)}" # 10 shards per user
# Read: query all 10 shards, merge results
2. Eventually consistent vs strongly consistent reads
import boto3
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('Orders')
# Eventually consistent (default) -- 50% cost, may lag milliseconds
response = table.get_item(Key={'order_id': '12345'})
# Strongly consistent -- full cost, guaranteed latest data
response = table.get_item(
Key={'order_id': '12345'},
ConsistentRead=True
)
3. Global Secondary Indexes (GSI)
# Table: Orders (PK=user_id, SK=order_id)
# GSI: status-created_at-index (PK=status, SK=created_at)
# Query all pending orders in last 24 hours
response = table.query(
IndexName='status-created_at-index',
KeyConditionExpression=Key('status').eq('PENDING') &
Key('created_at').gte('2026-06-07T00:00:00Z')
)
4. DynamoDB Streams + Lambda
Write to DynamoDB -> Stream event -> Lambda processes change
Use case: invalidate ElastiCache on DynamoDB write
replicate to OpenSearch for full-text search
trigger downstream workflows
5. DAX (DynamoDB Accelerator) In-memory cache in front of DynamoDB. Microsecond reads for hot data. Same DynamoDB API -- DAX SDK is a drop-in replacement.
Networking
Q8. Explain VPC architecture: subnets, route tables, NACLs, and security groups.
VPC anatomy:
VPC (10.0.0.0/16)
Public Subnet (10.0.1.0/24) -- AZ-1a
Internet Gateway -> Route table: 0.0.0.0/0 -> IGW
EC2 instances with public IPs (web servers)
Public Subnet (10.0.2.0/24) -- AZ-1b
NAT Gateway (for private subnet outbound)
Private Subnet (10.0.10.0/24) -- AZ-1a
Route table: 0.0.0.0/0 -> NAT Gateway
EC2 instances (app servers, no public IP)
Private Subnet (10.0.20.0/24) -- AZ-1b
Route table: 0.0.0.0/0 -> NAT Gateway
RDS instance (no internet access)
Security Groups vs NACLs:
| Aspect | Security Group | NACL |
|---|---|---|
| Level | Instance (ENI) | Subnet |
| State | Stateful (return traffic auto-allowed) | Stateless (must allow inbound AND outbound) |
| Rules | Allow only | Allow + Deny |
| Evaluation | All rules evaluated | Rules evaluated in order (lowest number first) |
| Default | Deny all inbound, allow all outbound | Allow all in/out |
# Security Group: web server
Inbound: TCP 443 from 0.0.0.0/0
Inbound: TCP 80 from 0.0.0.0/0
Outbound: All traffic
# Security Group: app server (only web tier can talk to it)
Inbound: TCP 8080 from sg-web-server (SG reference, not IP)
Outbound: TCP 5432 to sg-database
# Security Group: RDS
Inbound: TCP 5432 from sg-app-server
Outbound: (empty -- stateful, return traffic allowed)
VPC Peering vs Transit Gateway:
- VPC Peering: Direct 1-to-1 connection, non-transitive (A-B + B-C does NOT give A-C access).
- Transit Gateway: Hub-and-spoke for 100s of VPCs + on-premise connections. Transitive routing. Centralized network policy.
Q9. How does CloudFront work, and when should you use it?
CloudFront is AWS's CDN: globally distributed edge locations cache content close to end users.
Architecture:
User (Mumbai) -> CloudFront Edge (Mumbai)
|-- Cache HIT -> return cached content (<5ms)
|-- Cache MISS -> fetch from Origin (S3/ALB/custom)
cache, serve to user
Origin types:
- S3 bucket (with OAC -- Origin Access Control, replaces legacy OAI).
- ALB / EC2 (dynamic content).
- API Gateway.
- Custom HTTP origin (on-premise, third-party).
Behaviors (routing rules):
Cache Behavior 1: Path /api/* -> Origin: ALB, TTL=0 (no cache)
Cache Behavior 2: Path /static/* -> Origin: S3, TTL=86400 (24 hours)
Cache Behavior 3: Default (*) -> Origin: ALB, TTL=300
Cache invalidation:
# Invalidate specific path (charged after 1,000 free/month)
aws cloudfront create-invalidation \
--distribution-id E1234ABCDE \
--paths "/images/*" "/css/main.css"
# Better practice: use versioned filenames (/css/main.v2.css) to avoid invalidation cost
Security features:
- HTTPS enforcement + HTTP-to-HTTPS redirect.
- AWS WAF integration (rate limiting, SQL injection protection, IP blocking).
- Signed URLs / Signed Cookies for private content (paid video, premium downloads).
- Origin Shield: additional caching layer between edge and origin (reduces origin load).
- Geo-restriction: block or allow by country.
High Availability and Disaster Recovery
Q10. What are the four DR strategies in AWS, and when do you use each?
AWS defines four DR strategies on the reliability/cost spectrum:
| Strategy | RTO | RPO | Cost | Setup |
|---|---|---|---|---|
| Backup and Restore | Hours | Hours | Lowest | S3 backups, restore on disaster |
| Pilot Light | Minutes-hours | Minutes | Low | Core services always on (DB), rest off |
| Warm Standby | Minutes | Seconds-minutes | Medium | Scaled-down running replica in DR region |
| Multi-Site Active/Active | Seconds | Near-zero | Highest | Full capacity in both regions |
Backup and Restore:
# Automated S3 cross-region replication
aws s3api put-bucket-replication \
--bucket source-bucket \
--replication-configuration file://replication.json
# replication.json: Rules with Destination.Bucket = arn:aws:s3:::dr-bucket
Pilot Light:
- RDS snapshot to DR region, manual restore on disaster.
- AMIs copied to DR region.
- Route 53 health checks switch DNS on failure.
Warm Standby:
Primary: Auto Scaling Group min=10, running full load
DR: Auto Scaling Group min=2, running at low scale
On disaster: update DR ASG desired=10, update Route 53 failover record
Active/Active:
Route 53 Latency-Based Routing:
us-east-1: full capacity, handles east coast traffic
eu-west-1: full capacity, handles EU traffic
DynamoDB Global Tables: multi-master, automatic conflict resolution
Aurora Global Database: writes to primary region, sub-second replication to secondary
Q11. How do Route 53 routing policies work?
| Policy | Use case |
|---|---|
| Simple | Single resource, no health checks |
| Weighted | A/B testing, blue/green canary (90/10 split) |
| Latency-Based | Route to lowest-latency region |
| Failover | Active/passive DR -- switch to standby on health check failure |
| Geolocation | Route by user's geographic location (country/continent) |
| Geoproximity | Route by proximity with adjustable bias |
| Multivalue Answer | Return up to 8 healthy records (client-side load balancing) |
Failover with health checks:
# Create health check
aws route53 create-health-check \
--caller-reference unique-ref-1 \
--health-check-config '{
"Type": "HTTPS",
"ResourcePath": "/health",
"FullyQualifiedDomainName": "api.primary.example.com",
"RequestInterval": 30,
"FailureThreshold": 3
}'
# Primary A record with failover type PRIMARY
# Secondary A record with failover type SECONDARY (DR region)
# Route 53 switches automatically when health check fails
Weighted routing for canary deployment:
Record A: api.example.com -> ALB-v1 (weight=90)
Record B: api.example.com -> ALB-v2 (weight=10)
# Gradually shift: 90/10 -> 70/30 -> 50/50 -> 0/100 -> delete Record A
Security
Q12. Explain the AWS shared responsibility model.
AWS Responsibility (Security OF the cloud):
- Physical infrastructure (data centers, hardware, networking)
- Hypervisor and host OS
- Managed service internals (S3 durability, RDS engine)
- Global infrastructure (regions, AZs, edge locations)
Customer Responsibility (Security IN the cloud):
- Guest OS on EC2 instances (patching, hardening)
- Applications and runtime
- Data encryption (at-rest and in-transit)
- IAM (identity, access, MFA)
- Network configuration (VPC, security groups, NACLs)
- S3 bucket policies, encryption settings
- Compliance for regulated data (HIPAA, PCI, SOC2)
Shared controls (both AWS and customer):
- Patch management: AWS patches hypervisor, customer patches guest OS.
- Configuration management: AWS manages service config, customer configures their usage.
- Awareness and training.
Interview follow-up: Which services require more customer security attention?
- EC2: Full OS responsibility. Patch, harden, configure firewall.
- S3: Default private, but customer must set bucket policies, block public access, enable encryption.
- Lambda: AWS manages runtime, customer secures code and IAM execution role.
- DynamoDB: AWS manages infrastructure, customer manages IAM policies, encryption, VPC endpoints.
Q13. What is AWS IAM, and what are best practices for access control?
IAM (Identity and Access Management) controls who can do what with AWS resources.
Core concepts:
- Users: Human identities with long-term credentials. Prefer SSO/federation over IAM users.
- Groups: Collections of users with shared policies.
- Roles: Temporary credentials for services, cross-account access, federated identities.
- Policies: JSON documents specifying Allow/Deny on Actions + Resources.
Principle of least privilege:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-app-bucket/uploads/*",
"Condition": {
"StringEquals": {
"s3:prefix": ["uploads/"]
}
}
}
]
}
Best practices:
1. Root account: enable MFA, never use for daily operations
2. IAM Users: enforce MFA, use access keys for programmatic access sparingly
3. Roles > Users: EC2/Lambda/ECS should always use instance roles
4. Policy: deny-by-default, grant minimum required permissions
5. Rotate credentials: audit unused access keys, set 90-day rotation policy
6. AWS Organizations + SCPs: account-level guardrails (e.g., deny all non-approved regions)
7. CloudTrail: log all API calls for audit
8. Access Analyzer: identify external access to S3, IAM roles, KMS keys
Cross-account access (common pattern):
// Trust policy on Role in Account B (allows Account A to assume it)
{
"Statement": [{
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::ACCOUNT-A-ID:root"},
"Action": "sts:AssumeRole",
"Condition": {"Bool": {"aws:MultiFactorAuthPresent": "true"}}
}]
}
// From Account A: assume role in Account B
aws sts assume-role \
--role-arn arn:aws:iam::ACCOUNT-B-ID:role/CrossAccountRole \
--role-session-name my-session
Architecture Design
Q14. How would you design a highly available, scalable web application on AWS?
Architecture (3-tier web app, multi-AZ, multi-region capable):
CloudFront (global CDN + WAF)
|
| HTTPS
v
Route 53 (health-check based failover, latency routing)
|
v
ALB (multi-AZ, sticky sessions for stateful apps)
|
v
ASG + EC2 (m7g.large, min=3, max=30, across 3 AZs)
- Stateless application tier
- Session data in ElastiCache Redis
- App config in SSM Parameter Store + Secrets Manager
|
v
Aurora (Multi-AZ, 2 read replicas, automated backups)
|
v
ElastiCache Redis (session store, application cache)
Data layer:
Static assets -> S3 + CloudFront (origin: S3 with OAC)
User uploads -> S3 (pre-signed URLs for direct upload, avoid EC2 proxy)
Search -> OpenSearch Service (sync from Aurora via Lambda + DynamoDB Streams)
CI/CD:
GitHub -> CodePipeline
-> CodeBuild (unit tests, build Docker image, push to ECR)
-> CodeDeploy blue/green to ASG (or ECS rolling update)
-> Route 53 weighted routing for canary (10% new, 90% old)
Monitoring:
CloudWatch: EC2 metrics, custom app metrics, dashboards, alarms
CloudWatch Logs: aggregated from all instances (CloudWatch Agent)
X-Ray: distributed tracing across ALB -> EC2 -> RDS -> ElastiCache
SNS: alert routing to PagerDuty/Slack on alarm state change
Q15. What is the difference between SQS and SNS, and when do you use each?
| Aspect | SQS | SNS |
|---|---|---|
| Model | Message queue (pull) | Pub/Sub (push) |
| Consumers | One consumer per message (unless FIFO with separate groups) | Multiple subscribers (fan-out) |
| Retention | Up to 14 days | No retention (fire and forget) |
| Delivery guarantee | At-least-once (standard), exactly-once (FIFO) | At-least-once |
| Use case | Decoupled async processing, task queue | Broadcast to multiple subscribers, fan-out |
SQS queue types:
| Type | Ordering | Deduplication | Max TPS |
|---|---|---|---|
| Standard | Best-effort | Possible duplicates | Nearly unlimited |
| FIFO | Strict | Exactly-once processing | 3,000 messages/s with batching |
SNS + SQS fan-out pattern (interview favorite):
Order Service publishes to SNS topic "order-events"
-> SQS Queue: inventory-updates (Inventory Service subscribes)
-> SQS Queue: email-notifications (Email Service subscribes)
-> SQS Queue: analytics-events (Analytics Service subscribes)
-> Lambda: real-time fraud-check
Benefit: Order Service decoupled from all downstream services
Each service processes at its own pace
New subscriber = add new SQS queue, no Order Service change
SQS visibility timeout:
# Consumer gets message, starts processing
# Message becomes invisible (visibility timeout, default 30s)
# If processing fails (no delete), message reappears after timeout
# Configure timeout = expected max processing time + buffer
# Dead Letter Queue (DLQ): after N failed attempts, move to DLQ
aws sqs create-queue --queue-name my-dlq
aws sqs set-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/123456789/my-queue \
--attributes '{
"RedrivePolicy": "{\"deadLetterTargetArn\":\"arn:aws:sqs:us-east-1:123456789:my-dlq\",\"maxReceiveCount\":5}"
}'
Q16. When would you use Lambda instead of EC2/ECS?
Lambda advantages:
- No server management (AWS handles patching, scaling, availability).
- Per-invocation billing (no cost when idle).
- Auto-scales to thousands of concurrent executions without configuration.
- Event-driven integrations with 200+ AWS services.
Lambda limitations:
- Max execution time: 15 minutes.
- Ephemeral: no persistent local state (use S3/DynamoDB/EFS).
- Cold start latency: 100ms-3s for first invocation (SnapStart for Java, provisioned concurrency mitigates).
- Memory: 128 MB to 10 GB (CPU scales with memory).
Use Lambda when:
- Event-driven processing: S3 upload triggers image resize
- Webhooks / API callbacks: Stripe, GitHub, Slack events
- Scheduled jobs: CloudWatch Events cron (daily report)
- Stream processing: Kinesis/DynamoDB Streams consumers
- Glue code: orchestrate other AWS services
- APIs with highly variable traffic (zero to spiky)
Use EC2/ECS when:
- Long-running processes (> 15 min)
- High sustained throughput (cost advantage over per-invocation)
- WebSocket connections (Lambda does not hold persistent connections)
- Specific runtime versions or system dependencies not in Lambda
- Legacy applications requiring OS-level access
# Lambda function triggered by S3 upload
import boto3
import json
def lambda_handler(event, context):
s3 = boto3.client('s3')
rekognition = boto3.client('rekognition')
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
# Detect labels in uploaded image
response = rekognition.detect_labels(
Image={'S3Object': {'Bucket': bucket, 'Name': key}},
MaxLabels=10,
MinConfidence=80
)
labels = [l['Name'] for l in response['Labels']]
# Store results in DynamoDB
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ImageLabels')
table.put_item(Item={
'image_key': key,
'labels': labels,
'processed_at': context.aws_request_id
})
return {'statusCode': 200, 'body': json.dumps('Processed successfully')}
Cost Optimization
Q17. What are the key strategies for reducing AWS costs?
Compute:
1. Right-sizing: CloudWatch metrics -> CPU/memory < 20% -> downsize
AWS Compute Optimizer: ML-based recommendations
2. Spot instances: 70-90% savings for stateless, fault-tolerant workloads
Spot Fleet: mix instance types to improve availability
3. Savings Plans / Reserved Instances: 40-60% for predictable baseline
Savings Plans > RIs: more flexible (covers Fargate + Lambda too)
4. Auto Scaling: scale to zero nights/weekends for dev/test
Instance Scheduler: automated start/stop via Lambda + DynamoDB
Storage:
1. S3 Lifecycle policies: move to IA -> Glacier -> expire
2. EBS: delete unattached volumes (common waste)
Downsize gp2 to gp3 (same performance, cheaper)
3. EBS snapshot cleanup: delete old snapshots, use Data Lifecycle Manager
4. S3 Intelligent-Tiering: automatic tier moves (no retrieval fee)
Database:
1. Aurora Serverless v2: scale to 0 for dev/test databases
2. RDS: stop dev/test instances nights/weekends (stop charges storage only)
3. Redshift: pause cluster when not in use (managed pause/resume)
4. DynamoDB on-demand vs provisioned: on-demand for unpredictable workloads
Network:
1. NAT Gateway: often top cost for private subnets
Consolidate to fewer NAT GWs, use VPC endpoints for S3/DynamoDB (no NAT fee)
2. Data transfer: same-AZ traffic free, cross-AZ costs $0.01/GB
Keep components in same AZ when possible
3. CloudFront: reduces data transfer out (CloudFront -> internet cheaper than EC2 -> internet)
Real-World Design Scenarios
Q18. Design a serverless data pipeline for real-time analytics on AWS.
Requirements: 100K events/second from IoT devices, store raw events, compute rolling aggregations, power real-time dashboard.
Architecture:
IoT Devices -> Kinesis Data Streams (10 shards, ~100K rec/s)
|
|-- Lambda (stream consumer, batch=100, window=30s)
| -> DynamoDB: rolling 5-min aggregations (TTL=7d)
|
|-- Kinesis Firehose -> S3 (raw events, Parquet format)
|
Glue Crawler (auto-discover schema)
|
Athena (ad-hoc SQL on S3)
|
QuickSight (dashboard on Athena)
DynamoDB -> API Gateway -> Lambda (read aggregations)
|
CloudFront (cache API responses 30s)
|
React dashboard (WebSocket for live updates)
Kinesis shard calculation:
Write throughput: 100K records/s, avg 200 bytes/record = 20 MB/s
Kinesis shard capacity: 1,000 records/s or 1 MB/s per shard
Required shards: MAX(100K/1000, 20/1) = 20 shards
Read throughput per consumer: 2 MB/s per shard per consumer
With Lambda consumer: 20 shards * 2 MB/s = 40 MB/s fan-out
Lambda aggregation:
def lambda_handler(event, context):
aggregations = {}
for record in event['Records']:
payload = json.loads(base64.b64decode(record['kinesis']['data']))
device_id = payload['device_id']
metric = payload['value']
if device_id not in aggregations:
aggregations[device_id] = {'sum': 0, 'count': 0, 'max': float('-inf')}
aggregations[device_id]['sum'] += metric
aggregations[device_id]['count'] += 1
aggregations[device_id]['max'] = max(aggregations[device_id]['max'], metric)
# Batch write to DynamoDB
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('DeviceMetrics')
with table.batch_writer() as batch:
for device_id, stats in aggregations.items():
batch.put_item(Item={
'device_id': device_id,
'window': int(time.time() // 300) * 300, # 5-min window
'avg': stats['sum'] / stats['count'],
'max': stats['max'],
'count': stats['count'],
'ttl': int(time.time()) + 7 * 86400
})
FAQ
Q: What is VPC endpoint, and when should you use it? A VPC endpoint lets EC2/Lambda in a private subnet reach S3, DynamoDB, or other AWS services without traffic leaving AWS's network (no NAT Gateway, no Internet Gateway). Two types: Gateway endpoints (S3, DynamoDB -- free, add to route table) and Interface endpoints (most other services -- cost per hour + per GB, creates ENI in subnet). Use gateway endpoints for S3/DynamoDB to eliminate NAT Gateway costs and keep traffic on private network.
Q: What is the difference between CloudWatch and CloudTrail? CloudWatch is for performance monitoring and operational visibility: metrics (CPU, memory, custom), logs (application, access, flow logs), alarms, dashboards, and anomaly detection. CloudTrail is for audit and compliance: records every API call made in the AWS account (who did what, when, from where). CloudTrail logs go to S3 and can be analyzed with Athena. Use CloudWatch to know IF something is wrong; use CloudTrail to know WHO changed something.
Q: How do you secure data at rest and in transit on AWS? At rest: S3 Server-Side Encryption (SSE-S3 managed keys, SSE-KMS for audit trail, SSE-C for customer-managed keys outside AWS), EBS encryption (KMS), RDS encryption (KMS, enabled at creation), DynamoDB encryption (always on, KMS key choice). In transit: enforce HTTPS on ALB/CloudFront/API Gateway, use TLS 1.2+, ACM (AWS Certificate Manager) for free managed certificates, TLS between RDS client and DB, VPC PrivateLink / VPN / Direct Connect for on-premise-to-AWS. Confirm current AWS encryption defaults on the official AWS security documentation before your interview.
Related Topics
Methodology applied to this articlelast verified 8 Jun 2026
- No fabricated salary numbers or success rates. If we quote a range, it's sourced.
- No noun-substituted templates. This article was not generated by swapping company names in a stock prompt.
- No paid placements, sponsored coaching links, or affiliate-shilled course pushes.
Explore this topic cluster
More resources in Uncategorized
Use the category hub to browse similar questions, exam patterns, salary guides, and preparation resources related to this topic.
Paid contributor programme
Sat this this year? Share your story, earn ₹500.
First-person experience reports help future candidates prep smarter. We pay verified contributors ₹500 via UPI per accepted story - with byline.
Submit your story →Ready to practice?
Take a free timed mock test
Put what you learned into practice. Our mock tests match the 2026 pattern with timer, navigator, reveal, and score breakdown. No signup.
Start Free Mock Test →Related Articles
Airbnb Interview Questions 2026: Top Tech, HR & Behavioural Q&As for Freshers
Clearing Airbnb's fresher loop in 2026 comes down to preparing for the exact mix of questions across technical, behavioural,...
Airtel Interview Questions 2026: Top Tech, HR & Behavioural Q&As for Freshers
Clearing Airtel's fresher loop in 2026 comes down to preparing for the exact mix of questions across technical, behavioural,...
AMD Interview Questions 2026: Top Tech, HR & Behavioural Q&As for Freshers
Clearing AMD's fresher loop in 2026 comes down to preparing for the exact mix of questions across technical, behavioural,...
Atlassian Interview Questions 2026: Top Tech, HR & Behavioural Q&As for Freshers
Clearing Atlassian's fresher loop in 2026 comes down to preparing for the exact mix of questions across technical,...
Barclays Interview Questions 2026
_Last verified by [Aditya Sharma](/author/aditya-sharma/) · cross-checked against PapersAdda Hiring Pulse and...
More from PapersAdda
Top 40 Go (Golang) Interview Questions 2026, Complete Guide with Solutions
Airbnb Interview Questions 2026: Top Tech, HR & Behavioural Q&As for Freshers
Airtel Interview Questions 2026: Top Tech, HR & Behavioural Q&As for Freshers
AMD Interview Questions 2026: Top Tech, HR & Behavioural Q&As for Freshers