Amazon Rekognition

Amazon Rekognition is a managed computer-vision service for images and video. It provides pre-trained APIs for object and scene detection, facial analysis, text-in-image OCR, content moderation, celebrity recognition, and PPE detection — plus Custom Labels for training bespoke image classifiers with a small labeled dataset, and Face Liveness for anti-spoofing identity flows.

Key Features:

Label Detection: Identifies thousands of objects, scenes, and activities in images or video frames with confidence scores.
Face Analysis & Comparison: Extracts facial attributes (emotion, age range, eyes open, pose) and compares faces across images; face collections support 1-to-many search.
Face Liveness: Detects spoofing (printed photos, replays, deepfakes) during identity verification flows.
Content Moderation: Flags explicit, suggestive, violent, or otherwise unsafe content with a hierarchical taxonomy — useful for UGC platforms.
Text in Image (OCR): Detects and reads text within photos (signs, license plates, product packaging) — distinct from Textract's document focus.
Custom Labels: Train a domain-specific image classifier or object detector with as few as ~10 images per label.
Video Analysis: Asynchronous jobs over S3-hosted video detect labels, faces, persons, celebrities, and moderation events along a timeline.
Streaming Video: Real-time face detection on Kinesis Video Streams for live security and engagement use cases.
PPE Detection: Identifies whether persons in images are wearing hardhats, masks, gloves, and other PPE.

Common Use Cases:

Media Asset Tagging: Auto-tag large image/video libraries with labels, scenes, and celebrities for search.
Content Moderation: Gate user-generated uploads before publish to reduce policy violations.
Identity Verification: Compare a selfie to a government-ID photo for onboarding or access control (paired with liveness detection).
Retail & Inventory: Detect products on shelves or count items in warehouse photos with Custom Labels.
Workplace Safety: Detect PPE compliance (hardhats, masks, gloves) in site photos.

Service Limits & Quotas:

Image size: up to 15 MB for direct upload, 5 MB when passed inline; up to 100 MB when stored in S3.
Image dimensions: minimum 80 x 80 pixels.
Video format: MP4 or MOV (H.264) stored in S3, up to 10 GB.
Face collection size: up to 20 million faces per collection.
Custom Labels training: 10–250,000 images per project; minimum 10 per label.
Concurrent video jobs: default soft limit 20 per API per account.
Streaming Video processors: default soft limit 5 per region.

Pricing Model:

Image APIs: billed per image processed, with tiered discounts above 1M images/month.
Video APIs: billed per minute of video analyzed.
Face storage: billed per 1,000 faces stored per month.
Custom Labels: training billed per hour; inference billed per inference-hour while the model is hosted (charges accrue even when idle).
Streaming Video: billed per stream processor hour.
Common cost surprises: Custom Labels models left hosted in dev (per-hour even when idle); enabling all moderation, label, and face features per image when one would do; per-frame video sampling at high frame rates.

Code Example:


import boto3

rek = boto3.client("rekognition", region_name="us-west-2")

# Detect labels in an image
labels = rek.detect_labels(
    Image={"S3Object": {"Bucket": "my-images", "Name": "warehouse/row-4.jpg"}},
    MaxLabels=10,
    MinConfidence=75,
)
for label in labels["Labels"]:
    print(f"{label['Name']:20s} {label['Confidence']:.1f}%")

# Compare a selfie to an ID photo for verification
match = rek.compare_faces(
    SourceImage={"S3Object": {"Bucket": "kyc", "Name": "selfie.jpg"}},
    TargetImage={"S3Object": {"Bucket": "kyc", "Name": "id-photo.jpg"}},
    SimilarityThreshold=90,
)
print("Match similarity:", match["FaceMatches"][0]["Similarity"] if match["FaceMatches"] else "no match")

# Search a face collection (1-to-many)
search = rek.search_faces_by_image(
    CollectionId="employees",
    Image={"S3Object": {"Bucket": "kyc", "Name": "selfie.jpg"}},
    FaceMatchThreshold=90,
    MaxFaces=1,
)
for m in search["FaceMatches"]:
    print(m["Face"]["ExternalImageId"], m["Similarity"])

Common Interview Questions:

When should you use Custom Labels vs SageMaker?

Use Custom Labels when you have a relatively small dataset (10–10,000 images), need a managed training experience, and standard image classification or object detection is enough. Move to SageMaker for full control over architecture, larger datasets, custom losses, on-device export, or non-image vision tasks.

How does face comparison differ from face search?

CompareFaces matches one source image against one target image (1-to-1, no storage). SearchFacesByImage matches a face against a stored Face Collection (1-to-many) — required for "find this person across our user base" workflows.

What is Face Liveness and why does it matter?

Liveness detects whether a face in a video stream belongs to a real, present person versus a printed photo, screen replay, or deepfake. Mandatory for any production identity verification flow to prevent spoofing attacks.

How does Rekognition handle moderation taxonomies?

Returns hierarchical labels (parent + child) like "Suggestive → Female Swimwear" with confidence scores. You set a threshold per category and route flagged content to manual review (often via Amazon A2I).

How do you process video efficiently?

Use the async video APIs (StartLabelDetection, StartContentModeration) on S3-hosted video. Subscribe to the SNS completion topic instead of polling. Output includes timestamped detections so you can build timelines without re-scanning frames.

What's the cost trap with Custom Labels?

Hosted models bill per inference-hour while running, regardless of traffic. Stop the model when not in use, and consolidate labels into a single project where possible to avoid hosting many small models.

Rekognition covers the common vision tasks end-to-end; reach for SageMaker when you need fully custom models, non-standard outputs, or on-premises deployment.