AWS Glue Workflow

AWS Glue Workflow is the native orchestration layer inside AWS Glue. It lets you compose multiple Glue crawlers, ETL jobs, and triggers into a single, dependency-aware DAG with shared run state, run-level monitoring, and conditional branching. It is purpose-built for Glue-only pipelines; for cross-service orchestration, Step Functions or MWAA are better fits.


Key Features:


Common Use Cases:


Service Limits & Quotas:


Pricing Model:


Code Example:


import boto3

glue = boto3.client("glue", region_name="us-west-2")

# Create the workflow shell
glue.create_workflow(
    Name="daily-events-pipeline",
    Description="Crawl raw S3, transform, then crawl curated.",
    DefaultRunProperties={"target_date": ""},
)

# Schedule trigger fires the crawler at 02:00 UTC every day
glue.create_trigger(
    Name="start-raw-crawl",
    Type="SCHEDULED",
    Schedule="cron(0 2 * * ? *)",
    WorkflowName="daily-events-pipeline",
    Actions=[{"CrawlerName": "raw-events-crawler"}],
    StartOnCreation=True,
)

# Conditional trigger fires the ETL job only when the crawler succeeds
glue.create_trigger(
    Name="run-transform-on-crawl-success",
    Type="CONDITIONAL",
    WorkflowName="daily-events-pipeline",
    Predicate={
        "Logical": "AND",
        "Conditions": [{
            "LogicalOperator": "EQUALS",
            "CrawlerName": "raw-events-crawler",
            "CrawlState": "SUCCEEDED",
        }],
    },
    Actions=[{"JobName": "events-transform"}],
    StartOnCreation=True,
)
  


Common Interview Questions:

When should you use Glue Workflow versus Step Functions or MWAA?

Glue Workflow is the right choice when every node is a Glue job or crawler — it shares run state and integrates with the catalog without extra plumbing. Step Functions is better when you need to mix Lambda, ECS, EMR, SageMaker, or external services. MWAA (managed Airflow) wins when you need a rich operator ecosystem, complex dependency logic, or shared scheduling across many teams.

How do you pass values between nodes in a workflow?

Use workflow run properties (typed key-value pairs) seeded at start time and updated by jobs via get_workflow_run_properties / put_workflow_run_properties. Each node sees the latest values.

What types of triggers can start a workflow?

On-demand (manual or API), scheduled (cron expression), and event-based via EventBridge — useful for S3 object-created events or upstream pipeline completions.

How does conditional branching work?

A conditional trigger evaluates a predicate (AND/OR over job or crawler states like SUCCEEDED, FAILED, STOPPED). If the predicate matches, its actions fire — letting you route success and failure paths to different downstream jobs.

How do you monitor and alert on workflow failures?

CloudWatch metrics expose per-job success/failure counts and durations. EventBridge emits state-change events that can fan out to SNS, Lambda, or PagerDuty. The Glue console shows run-level status and per-node history.

What is a Glue Blueprint?

A reusable workflow template parameterized for common patterns (e.g., partitioned ingest from S3). Operators instantiate a blueprint with specific parameters and Glue generates the underlying jobs, crawlers, and triggers.


AWS Glue Workflow is the simplest way to orchestrate Glue-native pipelines end to end. For Glue-only workloads it removes the need for an external orchestrator; for hybrid pipelines, reach for Step Functions or MWAA instead.