Amazon Kinesis Data Streams
Amazon Kinesis Data Streams is a durable, replayable, horizontally-scaled streaming service that ingests and stores real-time data for custom consumers. Unlike Data Firehose (which delivers streams to sinks), Data Streams exposes a raw stream that many applications can consume independently — the AWS analogue to Apache Kafka. Both provisioned (fixed shard count) and on-demand (auto-scaling, no shard math) capacity modes are available.
Core Concepts:
- Stream: An ordered, partitioned sequence of records with configurable retention (24 hours default, up to 365 days).
- Shard: The unit of parallelism and capacity. Each shard supports 1 MB/s or 1,000 records/s in; 2 MB/s out (or 2 MB/s per registered consumer with Enhanced Fan-Out).
- Partition Key: Record attribute hashed to decide the target shard — choose high-cardinality keys to avoid hot shards.
- Sequence Number: Monotonically increasing ID assigned per record, used for ordering and resuming.
- Capacity Modes: Provisioned (fixed shard count, cheapest at steady throughput) or On-Demand (auto-scales shards, pay per GB).
Key Features:
- Replay & Multi-Consumer: Any consumer can read from a checkpoint; multiple independent applications share the stream without affecting each other (especially with Enhanced Fan-Out).
- Enhanced Fan-Out (EFO): HTTP/2 push delivery per consumer at 2 MB/s — avoids shared 2 MB/s read limit across consumers.
- Server-Side Encryption: KMS encryption at rest; TLS in transit.
- Integrations: Native consumers include Lambda, Kinesis Data Firehose, Managed Service for Apache Flink, and the KCL (Kinesis Client Library) for custom applications.
- Ordering: Strict ordering guaranteed within a shard (i.e., within a partition key).
- On-Demand Mode: Auto-scales to handle up to 200 MB/s write throughput per stream without shard management; doubles capacity within 15 minutes of sustained load.
- Resharding: Split or merge shards in provisioned mode to adjust capacity (preserves ordering within partition key boundaries).
Common Use Cases:
- Clickstream / Event Pipelines: Raw feed for analytics, fraud detection, personalization — each consumer reads at its own pace.
- Log Ingestion: Centralize application and infrastructure logs before fanning out to search, storage, and alerting.
- Metrics & IoT Telemetry: Sub-second ingestion of device data for real-time dashboards and alarms.
- Change Data Capture: DMS to Kinesis for near-real-time replication into downstream stores.
- Real-Time ML Inference: Flink or Lambda reads the stream, enriches events, and writes predictions back.
Service Limits & Quotas:
- Per shard write: 1 MiB/s and 1,000 records/s.
- Per shard read: 2 MiB/s shared (or 2 MiB/s per consumer with EFO).
- Record size: up to 1 MiB.
- Retention: 24 hours default; configurable up to 8,760 hours (365 days).
- On-demand throughput: 200 MB/s write per stream by default; can scale higher via support.
- EFO consumers: default soft limit 20 per stream.
- GetRecords: max 10,000 records or 10 MiB per call; up to 5 transactions/s per shard.
Pricing Model:
- Provisioned mode: billed per shard-hour plus per million PUT payload units (25 KB each).
- On-demand mode: billed per GB of data ingested and retrieved plus per stream-hour — simpler but more expensive than well-tuned provisioned.
- Extended retention: additional charge per shard-hour for retention beyond 24 hours.
- Enhanced Fan-Out: per consumer-shard-hour plus per GB retrieved.
- Common cost surprises: oversharded provisioned streams sitting idle; EFO enabled on dev streams; long retention used purely "in case"; small records inflated by the 25 KB billing unit.
Code Example:
import boto3, json, time
kinesis = boto3.client("kinesis", region_name="us-west-2")
# Producer
kinesis.put_record(
StreamName="events",
Data=json.dumps({"user": 42, "action": "click", "ts": time.time()}),
PartitionKey="user-42",
)
# Consumer (simple shard iterator pattern; KCL is recommended for production)
shard_id = kinesis.describe_stream(StreamName="events")["StreamDescription"]["Shards"][0]["ShardId"]
it = kinesis.get_shard_iterator(
StreamName="events", ShardId=shard_id, ShardIteratorType="LATEST"
)["ShardIterator"]
resp = kinesis.get_records(ShardIterator=it, Limit=100)
for rec in resp["Records"]:
print(json.loads(rec["Data"]))
Data Streams vs. Firehose vs. MSK:
- Data Streams: Raw, replayable stream — multiple custom consumers, low-latency, ordered per partition key.
- Firehose: Serverless delivery to S3/Redshift/OpenSearch/Splunk — no custom consumer code.
- Amazon MSK: Managed Apache Kafka — choose when you need Kafka ecosystem compatibility (Connect, Streams, Schema Registry).
Common Interview Questions:
When would you choose On-Demand over Provisioned mode?
On-Demand removes shard math and auto-scales to bursty traffic — ideal when load is unpredictable or for new streams without baseline data. Provisioned is significantly cheaper at steady, well-understood throughput.
What is Enhanced Fan-Out and when is it worth the cost?
EFO gives each registered consumer a dedicated 2 MB/s push channel via HTTP/2, eliminating contention with other consumers and reducing latency to ~70 ms. Worth it when multiple consumers need low latency or high throughput concurrently; otherwise the standard pull-based model is cheaper.
How do you choose a partition key?
Pick a high-cardinality, evenly-distributed attribute (user ID, device ID). Low-cardinality keys (e.g., country code) cause hot shards and throttling. If ordering doesn't matter, use a random UUID.
How does Lambda integrate with Kinesis Data Streams?
Via an Event Source Mapping that polls the stream and invokes Lambda with a batch. Tunables: batch size, batch window, parallelization factor (multiple Lambdas per shard while preserving partition-key ordering), tumbling windows, and on-failure destinations.
What's the difference between Kinesis Data Streams and MSK?
Both are durable streaming platforms. Data Streams is fully managed AWS-native with simpler ops; MSK is managed Apache Kafka with full Kafka API compatibility and ecosystem (Connect, Streams, Schema Registry). Choose MSK when you need Kafka tooling or are migrating from on-prem Kafka.
How do you guarantee exactly-once processing downstream?
Kinesis itself provides at-least-once delivery. Build idempotency into consumers — track a sequence number or record key in DynamoDB to deduplicate before side effects. Flink and KCL provide checkpointing primitives that simplify this.