Amazon Kinesis Data Streams

Amazon Kinesis Data Streams is a durable, replayable, horizontally-scaled streaming service that ingests and stores real-time data for custom consumers. Unlike Data Firehose (which delivers streams to sinks), Data Streams exposes a raw stream that many applications can consume independently — the AWS analogue to Apache Kafka. Both provisioned (fixed shard count) and on-demand (auto-scaling, no shard math) capacity modes are available.

Core Concepts:

Stream: An ordered, partitioned sequence of records with configurable retention (24 hours default, up to 365 days).
Shard: The unit of parallelism and capacity. Each shard supports 1 MB/s or 1,000 records/s in; 2 MB/s out (or 2 MB/s per registered consumer with Enhanced Fan-Out).
Partition Key: Record attribute hashed to decide the target shard — choose high-cardinality keys to avoid hot shards.
Sequence Number: Monotonically increasing ID assigned per record, used for ordering and resuming.
Capacity Modes: Provisioned (fixed shard count, cheapest at steady throughput) or On-Demand (auto-scales shards, pay per GB).

Key Features:

Replay & Multi-Consumer: Any consumer can read from a checkpoint; multiple independent applications share the stream without affecting each other (especially with Enhanced Fan-Out).
Enhanced Fan-Out (EFO): HTTP/2 push delivery per consumer at 2 MB/s — avoids shared 2 MB/s read limit across consumers.
Server-Side Encryption: KMS encryption at rest; TLS in transit.
Integrations: Native consumers include Lambda, Kinesis Data Firehose, Managed Service for Apache Flink, and the KCL (Kinesis Client Library) for custom applications.
Ordering: Strict ordering guaranteed within a shard (i.e., within a partition key).
On-Demand Mode: Auto-scales to handle up to 200 MB/s write throughput per stream without shard management; doubles capacity within 15 minutes of sustained load.
Resharding: Split or merge shards in provisioned mode to adjust capacity (preserves ordering within partition key boundaries).

Common Use Cases:

Clickstream / Event Pipelines: Raw feed for analytics, fraud detection, personalization — each consumer reads at its own pace.
Log Ingestion: Centralize application and infrastructure logs before fanning out to search, storage, and alerting.
Metrics & IoT Telemetry: Sub-second ingestion of device data for real-time dashboards and alarms.
Change Data Capture: DMS to Kinesis for near-real-time replication into downstream stores.
Real-Time ML Inference: Flink or Lambda reads the stream, enriches events, and writes predictions back.

Service Limits & Quotas:

Per shard write: 1 MiB/s and 1,000 records/s.
Per shard read: 2 MiB/s shared (or 2 MiB/s per consumer with EFO).
Record size: up to 1 MiB.
Retention: 24 hours default; configurable up to 8,760 hours (365 days).
On-demand throughput: 200 MB/s write per stream by default; can scale higher via support.
EFO consumers: default soft limit 20 per stream.
GetRecords: max 10,000 records or 10 MiB per call; up to 5 transactions/s per shard.

Pricing Model:

Provisioned mode: billed per shard-hour plus per million PUT payload units (25 KB each).
On-demand mode: billed per GB of data ingested and retrieved plus per stream-hour — simpler but more expensive than well-tuned provisioned.
Extended retention: additional charge per shard-hour for retention beyond 24 hours.
Enhanced Fan-Out: per consumer-shard-hour plus per GB retrieved.
Common cost surprises: oversharded provisioned streams sitting idle; EFO enabled on dev streams; long retention used purely "in case"; small records inflated by the 25 KB billing unit.

Code Example:


import boto3, json, time

kinesis = boto3.client("kinesis", region_name="us-west-2")

# Producer
kinesis.put_record(
    StreamName="events",
    Data=json.dumps({"user": 42, "action": "click", "ts": time.time()}),
    PartitionKey="user-42",
)

# Consumer (simple shard iterator pattern; KCL is recommended for production)
shard_id = kinesis.describe_stream(StreamName="events")["StreamDescription"]["Shards"][0]["ShardId"]
it = kinesis.get_shard_iterator(
    StreamName="events", ShardId=shard_id, ShardIteratorType="LATEST"
)["ShardIterator"]

resp = kinesis.get_records(ShardIterator=it, Limit=100)
for rec in resp["Records"]:
    print(json.loads(rec["Data"]))

Data Streams vs. Firehose vs. MSK:

Data Streams: Raw, replayable stream — multiple custom consumers, low-latency, ordered per partition key.
Firehose: Serverless delivery to S3/Redshift/OpenSearch/Splunk — no custom consumer code.
Amazon MSK: Managed Apache Kafka — choose when you need Kafka ecosystem compatibility (Connect, Streams, Schema Registry).

Common Interview Questions:

When would you choose On-Demand over Provisioned mode?

On-Demand removes shard math and auto-scales to bursty traffic — ideal when load is unpredictable or for new streams without baseline data. Provisioned is significantly cheaper at steady, well-understood throughput.

What is Enhanced Fan-Out and when is it worth the cost?

EFO gives each registered consumer a dedicated 2 MB/s push channel via HTTP/2, eliminating contention with other consumers and reducing latency to ~70 ms. Worth it when multiple consumers need low latency or high throughput concurrently; otherwise the standard pull-based model is cheaper.

How do you choose a partition key?

Pick a high-cardinality, evenly-distributed attribute (user ID, device ID). Low-cardinality keys (e.g., country code) cause hot shards and throttling. If ordering doesn't matter, use a random UUID.

How does Lambda integrate with Kinesis Data Streams?

Via an Event Source Mapping that polls the stream and invokes Lambda with a batch. Tunables: batch size, batch window, parallelization factor (multiple Lambdas per shard while preserving partition-key ordering), tumbling windows, and on-failure destinations.

What's the difference between Kinesis Data Streams and MSK?

Both are durable streaming platforms. Data Streams is fully managed AWS-native with simpler ops; MSK is managed Apache Kafka with full Kafka API compatibility and ecosystem (Connect, Streams, Schema Registry). Choose MSK when you need Kafka tooling or are migrating from on-prem Kafka.

How do you guarantee exactly-once processing downstream?

Kinesis itself provides at-least-once delivery. Build idempotency into consumers — track a sequence number or record key in DynamoDB to deduplicate before side effects. Flink and KCL provide checkpointing primitives that simplify this.