Amazon Rekognition
Amazon Rekognition is a managed computer-vision service for images and video. It provides pre-trained APIs for object and scene detection, facial analysis, text-in-image OCR, content moderation, celebrity recognition, and PPE detection — plus Custom Labels for training bespoke image classifiers with a small labeled dataset, and Face Liveness for anti-spoofing identity flows.
Key Features:
- Label Detection: Identifies thousands of objects, scenes, and activities in images or video frames with confidence scores.
- Face Analysis & Comparison: Extracts facial attributes (emotion, age range, eyes open, pose) and compares faces across images; face collections support 1-to-many search.
- Face Liveness: Detects spoofing (printed photos, replays, deepfakes) during identity verification flows.
- Content Moderation: Flags explicit, suggestive, violent, or otherwise unsafe content with a hierarchical taxonomy — useful for UGC platforms.
- Text in Image (OCR): Detects and reads text within photos (signs, license plates, product packaging) — distinct from Textract's document focus.
- Custom Labels: Train a domain-specific image classifier or object detector with as few as ~10 images per label.
- Video Analysis: Asynchronous jobs over S3-hosted video detect labels, faces, persons, celebrities, and moderation events along a timeline.
- Streaming Video: Real-time face detection on Kinesis Video Streams for live security and engagement use cases.
- PPE Detection: Identifies whether persons in images are wearing hardhats, masks, gloves, and other PPE.
Common Use Cases:
- Media Asset Tagging: Auto-tag large image/video libraries with labels, scenes, and celebrities for search.
- Content Moderation: Gate user-generated uploads before publish to reduce policy violations.
- Identity Verification: Compare a selfie to a government-ID photo for onboarding or access control (paired with liveness detection).
- Retail & Inventory: Detect products on shelves or count items in warehouse photos with Custom Labels.
- Workplace Safety: Detect PPE compliance (hardhats, masks, gloves) in site photos.
Service Limits & Quotas:
- Image size: up to 15 MB for direct upload, 5 MB when passed inline; up to 100 MB when stored in S3.
- Image dimensions: minimum 80 x 80 pixels.
- Video format: MP4 or MOV (H.264) stored in S3, up to 10 GB.
- Face collection size: up to 20 million faces per collection.
- Custom Labels training: 10–250,000 images per project; minimum 10 per label.
- Concurrent video jobs: default soft limit 20 per API per account.
- Streaming Video processors: default soft limit 5 per region.
Pricing Model:
- Image APIs: billed per image processed, with tiered discounts above 1M images/month.
- Video APIs: billed per minute of video analyzed.
- Face storage: billed per 1,000 faces stored per month.
- Custom Labels: training billed per hour; inference billed per inference-hour while the model is hosted (charges accrue even when idle).
- Streaming Video: billed per stream processor hour.
- Common cost surprises: Custom Labels models left hosted in dev (per-hour even when idle); enabling all moderation, label, and face features per image when one would do; per-frame video sampling at high frame rates.
Code Example:
import boto3
rek = boto3.client("rekognition", region_name="us-west-2")
# Detect labels in an image
labels = rek.detect_labels(
Image={"S3Object": {"Bucket": "my-images", "Name": "warehouse/row-4.jpg"}},
MaxLabels=10,
MinConfidence=75,
)
for label in labels["Labels"]:
print(f"{label['Name']:20s} {label['Confidence']:.1f}%")
# Compare a selfie to an ID photo for verification
match = rek.compare_faces(
SourceImage={"S3Object": {"Bucket": "kyc", "Name": "selfie.jpg"}},
TargetImage={"S3Object": {"Bucket": "kyc", "Name": "id-photo.jpg"}},
SimilarityThreshold=90,
)
print("Match similarity:", match["FaceMatches"][0]["Similarity"] if match["FaceMatches"] else "no match")
# Search a face collection (1-to-many)
search = rek.search_faces_by_image(
CollectionId="employees",
Image={"S3Object": {"Bucket": "kyc", "Name": "selfie.jpg"}},
FaceMatchThreshold=90,
MaxFaces=1,
)
for m in search["FaceMatches"]:
print(m["Face"]["ExternalImageId"], m["Similarity"])
Common Interview Questions:
When should you use Custom Labels vs SageMaker?
Use Custom Labels when you have a relatively small dataset (10–10,000 images), need a managed training experience, and standard image classification or object detection is enough. Move to SageMaker for full control over architecture, larger datasets, custom losses, on-device export, or non-image vision tasks.
How does face comparison differ from face search?
CompareFaces matches one source image against one target image (1-to-1, no storage). SearchFacesByImage matches a face against a stored Face Collection (1-to-many) — required for "find this person across our user base" workflows.
What is Face Liveness and why does it matter?
Liveness detects whether a face in a video stream belongs to a real, present person versus a printed photo, screen replay, or deepfake. Mandatory for any production identity verification flow to prevent spoofing attacks.
How does Rekognition handle moderation taxonomies?
Returns hierarchical labels (parent + child) like "Suggestive → Female Swimwear" with confidence scores. You set a threshold per category and route flagged content to manual review (often via Amazon A2I).
How do you process video efficiently?
Use the async video APIs (StartLabelDetection, StartContentModeration) on S3-hosted video. Subscribe to the SNS completion topic instead of polling. Output includes timestamped detections so you can build timelines without re-scanning frames.
What's the cost trap with Custom Labels?
Hosted models bill per inference-hour while running, regardless of traffic. Stop the model when not in use, and consolidate labels into a single project where possible to avoid hosting many small models.
Rekognition covers the common vision tasks end-to-end; reach for SageMaker when you need fully custom models, non-standard outputs, or on-premises deployment.