Amazon ElastiCache

Amazon ElastiCache is a managed in-memory caching service offering Redis OSS, Valkey (the fork Amazon backs after Redis's license change), and Memcached. It offloads hot reads from databases, accelerates session state, and supports pub/sub and streams — all with sub-millisecond latency.


Engine Choices:


Key Features:


Common Use Cases:


Service Limits & Quotas:


Pricing Model:


Code Example — Cache-Aside Pattern with redis-py:


import json, redis, boto3

r = redis.Redis(
    host="prod-cache.abcdef.ng.0001.usw2.cache.amazonaws.com",
    port=6379,
    ssl=True,
    decode_responses=True,
)

def get_user(user_id: str) -> dict:
    key = f"user:{user_id}"
    cached = r.get(key)
    if cached:
        return json.loads(cached)

    # Fall through to DynamoDB
    ddb = boto3.resource("dynamodb").Table("Users")
    item = ddb.get_item(Key={"pk": user_id})["Item"]

    r.setex(key, 300, json.dumps(item, default=str))  # TTL 5 min
    return item

def invalidate_user(user_id: str):
    r.delete(f"user:{user_id}")
  

Create a Multi-AZ Valkey Cluster (CLI):


aws elasticache create-replication-group \
  --replication-group-id prod-cache \
  --replication-group-description "App-tier hot cache" \
  --engine valkey \
  --engine-version 7.2 \
  --cache-node-type cache.r7g.large \
  --num-node-groups 3 \
  --replicas-per-node-group 2 \
  --automatic-failover-enabled \
  --multi-az-enabled \
  --transit-encryption-enabled \
  --at-rest-encryption-enabled \
  --kms-key-id alias/elasticache \
  --cache-subnet-group-name prod-private \
  --security-group-ids sg-0abc123 \
  --snapshot-retention-limit 7
  


Common Interview Questions:

Redis/Valkey vs. Memcached?

Redis/Valkey: rich data types (sorted sets, hashes, streams, geo), persistence, replication, Multi-AZ failover, pub/sub, transactions, Lua scripting. Memcached: pure key-value, multithreaded (better single-node throughput on simple GET/SET), no persistence or replication. Pick Memcached only for trivial caches; pick Valkey/Redis for almost everything else.

What is cluster mode and when enable it?

Cluster mode shards data across multiple primary nodes (each with optional replicas), allowing horizontal scaling beyond a single node's RAM and write throughput. Enable when working set exceeds the largest node, or when write throughput exceeds a single primary. Requires cluster-aware Redis clients.

Cache-aside vs. write-through?

Cache-aside: app checks cache first, falls through to DB on miss, writes update DB and invalidate the cache. Simple, tolerant of cache failures, but stale reads possible. Write-through: every write goes to cache and DB synchronously — fresher cache but more write latency and cache must be available.

How do you handle a cache stampede?

The thundering-herd problem when a popular key expires and many requests miss simultaneously. Mitigations: random TTL jitter, single-flight locks (only one process refills, others wait), early refresh before expiry (probabilistic early expiration), or background refresh jobs.

What's the difference between Multi-AZ and Global Datastore?

Multi-AZ replicates within one region (synchronous to readers in other AZs) for HA — failover in seconds. Global Datastore replicates across regions (async, sub-second lag) for low-latency reads worldwide and regional DR — promotion of a secondary region is a manual action.

When use ElastiCache Serverless?

Spiky or unpredictable workloads where capacity planning is hard, dev/test environments, or new applications without traffic patterns. It auto-scales storage and request capacity in seconds and bills per usage. Trade-off: typically more expensive than a right-sized provisioned cluster at steady state.