Amazon EC2 (Elastic Compute Cloud)

Amazon EC2 provides resizable virtual machines (instances) in the AWS cloud. It remains the foundational AWS compute primitive — even managed services like RDS, EMR, and SageMaker run on EC2 underneath. EC2 is the right choice when you need full OS-level control, specialized hardware (GPU, high-memory, HPC), or software that doesn't fit containers or Lambda.


Key Concepts:


Pricing Models:


When to Use EC2 (vs. Alternatives):


Service Limits & Quotas:


Pricing Model:


Code Example:

Launching a tagged EC2 instance with an IAM instance profile and user-data using boto3:

import boto3

ec2 = boto3.client("ec2", region_name="us-west-2")

user_data = """#!/bin/bash
yum update -y
yum install -y python3
echo "ready" > /var/log/bootstrap.log
"""

response = ec2.run_instances(
    ImageId="ami-0abcdef1234567890",       # Amazon Linux 2023 AMI
    InstanceType="t3.small",
    MinCount=1, MaxCount=1,
    KeyName="my-keypair",
    SubnetId="subnet-0123456789abcdef0",
    SecurityGroupIds=["sg-0123456789abcdef0"],
    IamInstanceProfile={"Name": "ec2-app-role"},
    UserData=user_data,
    TagSpecifications=[{
        "ResourceType": "instance",
        "Tags": [
            {"Key": "Name", "Value": "app-worker-01"},
            {"Key": "Environment", "Value": "prod"},
            {"Key": "Owner", "Value": "kevin"},
        ],
    }],
    MetadataOptions={"HttpTokens": "required"},  # IMDSv2 only
)
print(response["Instances"][0]["InstanceId"])


Common Interview Questions:

What is the difference between an EBS volume and an instance store?

EBS volumes are network-attached block storage that persist independently from the instance lifecycle and can be snapshotted, encrypted, and re-attached. Instance store is physically attached NVMe SSD on the host — extremely fast but ephemeral; data is lost when the instance stops, terminates, or the underlying hardware fails. Use EBS for anything that must survive reboots, instance store for scratch space (shuffle data, caches, temp files for ML training).

How do Spot Instances work and when would you use them?

Spot uses spare EC2 capacity at up to ~90% off On-Demand pricing. AWS can reclaim the instance with a 2-minute warning when capacity is needed elsewhere. Use Spot for fault-tolerant workloads — Spark/EMR jobs, CI runners, batch processing, ML training with checkpointing, stateless web tiers behind a load balancer. Use Spot Fleet or mixed-instance ASGs to spread across many instance types and AZs to reduce interruption risk.

What is the difference between a Security Group and a Network ACL?

Security Groups are stateful instance-level firewalls — return traffic for an allowed inbound flow is automatically permitted. They support allow rules only. Network ACLs are stateless subnet-level firewalls supporting both allow and deny rules; you must explicitly allow return traffic. SGs are the everyday tool; NACLs are coarse-grained and used for broad deny patterns (e.g., blocking known-bad IP ranges at the subnet edge).

What is IMDSv2 and why does it matter?

The Instance Metadata Service v2 requires a session token obtained via PUT before any GET — preventing SSRF attacks where a vulnerable web app is tricked into fetching credentials from 169.254.169.254. Always set HttpTokens=required on new launches. The 2019 Capital One breach exploited IMDSv1.

Reserved Instance vs. Savings Plan — which would you pick?

Savings Plans are usually the better default — they apply commitment automatically across instance families, sizes, and regions (Compute Savings Plans) without needing to predict the exact instance type. RIs are still useful for predictable workloads that need capacity reservation guarantees in a specific AZ, or for RDS/Redshift/ElastiCache where Savings Plans don't apply.

How do you achieve high availability with EC2?

Distribute instances across multiple Availability Zones inside an Auto Scaling Group, register them behind an Application or Network Load Balancer, use ELB health checks to replace failures automatically, and store state in shared services (RDS Multi-AZ, ElastiCache, S3) rather than on the instance. For region-level HA, replicate to a second region and front with Route 53 failover or Global Accelerator.


Best Practices: