Module Dependency Diagram — Python's Most Useful Architecture Diagram
For Python codebases, the most informative single diagram you can draw is the module dependency graph: boxes for each .py file or package, arrows for import statements, grouped into layers (api / domain / infrastructure, or whatever convention your project uses). Imports are the only coupling Python forces to be explicit, so this diagram reflects actual structure — not idealized structure.
1. Anatomy — Layered Architecture View
The clearest layout is three or four labeled layers stacked top-to-bottom. Each layer contains the modules that belong to it. Arrows between layers show import direction — and they should always point downward. An upward arrow is a layer violation.
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ Module Dependency — Layered Web Application │ │ │ │ ┌──────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ API LAYER (HTTP entry points, request/response schemas) │ │ │ │ │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │ │ │ │ routes.py │ │ schemas.py │ │ auth.py │ │ │ │ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │ │ │ │ │ │ └──────────────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ imports (downward only) │ │ ┌──────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ DOMAIN LAYER (business logic, no I/O) │ │ │ │ │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │ │ │ │ user.py │ │ order.py │ │ payment.py │ │ │ │ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │ │ │ │ │ │ └──────────────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ imports (downward only) │ │ ┌──────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ INFRASTRUCTURE LAYER (databases, external APIs, caches) │ │ │ │ │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │ │ │ │ db.py │ │ cache.py │ │ http.py │ │ │ │ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │ │ │ │ │ │ └──────────────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ Rule: arrows always point DOWNWARD. An upward import is a layer violation. │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
How to Read This Diagram
- Top layer (API): Anything that touches HTTP. Routes, request/response models, auth middleware. These import from domain.
- Middle layer (Domain): Business logic. Pure functions and dataclasses. No
import requests, noimport sqlalchemy, noopen(). Imports from infrastructure happen via protocols (see anti-pattern C below). - Bottom layer (Infrastructure): Database adapters, HTTP clients, caches, message queue producers. Implements protocols defined in domain.
- Arrows always go down. Domain importing from API is a layer violation. Infrastructure importing from API is a layer violation. A well-layered app is acyclic and stratified.
2. Three Anti-Patterns to Spot at a Glance
Once you can see the dependency graph, three problems jump out immediately. All three are silent killers — code keeps working until you try to refactor or test in isolation.
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ Module Dependency — Three Anti-Patterns to Spot │ │ │ │ (A) CIRCULAR IMPORT — refactoring becomes impossible │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ user.py │ │ order.py │ │ │ │ │ ──── imports ────────────────────────────────────────────────▶ │ │ │ │ │ │ ◀──── imports ──────────────────────────────────────────────── │ │ │ │ └──────────────┘ └──────────────┘ │ │ │ │ Fix: extract shared types to a third module both can import. │ │ │ │ (B) GOD MODULE — utils.py imported by EVERYTHING │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ routes │ │ services │ │ models │ │ tasks │ │ cli │ │ │ └─────┬────┘ └─────┬────┘ └─────┬────┘ └─────┬────┘ └─────┬────┘ │ │ │ │ │ │ │ │ │ └───────────────┴───────────────┬───────────────┴───────────────┘ │ │ ▼ │ │ ┌──────────────────┐ │ │ │ utils.py │ │ │ └──────────────────┘ │ │ │ │ Fix: split into focused modules — string_utils, date_utils, etc. │ │ │ │ (C) LAYER VIOLATION — domain secretly imports from infrastructure │ │ │ │ ┌────────────────────┐ ┌────────────────────────┐ │ │ │ domain/ │ │ infrastructure/ │ │ │ │ payment.py │ │ stripe_client.py │ │ │ │ │ ──── imports ─────────────────────▶│ │ │ │ │ │ (never do this) │ │ │ │ └────────────────────┘ └────────────────────────┘ │ │ │ │ Fix: define a Protocol in domain; infrastructure implements it. │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Anti-Pattern Details & Fixes
(A) Circular import. user.py imports order.py, which imports user.py back. Python handles this by raising ImportError at the second import, or — worse — by giving you a partially-loaded module that crashes later. The fix is almost always: extract the shared types (the things both modules reference) into a third module that both import from.
# shared/types.py (new module)
@dataclass
class UserId:
value: int
@dataclass
class OrderId:
value: int
# user.py and order.py both import from shared.types — no cycle.
(B) God module. utils.py starts as a place for two helper functions. Six months later it has 40 functions and every other module imports from it. You can't refactor any function in utils.py without auditing the whole codebase. Fix: split by responsibility — string_utils.py, date_utils.py, logging_utils.py. Then most modules only need one or two of those.
(C) Layer violation. Your domain module payment.py imports stripe_client.py directly. Now domain logic depends on Stripe. You can't unit-test payment logic without mocking Stripe; you can't swap to PayPal without rewriting domain. Fix: define a PaymentGateway Protocol in domain, have infrastructure implement it, inject the implementation at the application boundary.
# domain/payment.py
from typing import Protocol
class PaymentGateway(Protocol):
def charge(self, amount: Decimal, customer_id: str) -> ChargeResult: ...
def process_order(order: Order, gateway: PaymentGateway) -> None:
result = gateway.charge(order.total, order.customer_id)
# ... pure domain logic, no Stripe knowledge
# infrastructure/stripe_client.py
class StripeGateway: # implicitly satisfies PaymentGateway protocol
def charge(self, amount, customer_id): ...
# Wiring (at the edge, e.g. main.py or a DI container):
process_order(order, gateway=StripeGateway())
3. Generating It Automatically (pydeps, tach, snakefood)
Hand-drawn module dependency diagrams drift the moment someone adds an import. Auto-generate from source so the diagram is always current.
pydeps — quick visualizations
# Install
pip install pydeps
# Generate SVG of all dependencies (excluding standard library)
pydeps myproject/ --exclude-exact json os sys --max-bacon=2 -o deps.svg
# Limit depth and cluster by package
pydeps myproject/ --max-bacon=3 --cluster --max-cluster-size=10
--max-bacon=N limits how far the graph walks; --cluster groups submodules visually. pydeps writes Graphviz .dot output that you can render to SVG / PNG / PDF.
tach — enforce layering in CI
tach goes one step further: define your layers in tach.yml, then it fails CI when a forbidden import is added.
# tach.yml
modules:
- path: api
depends_on: [domain]
- path: domain
depends_on: [shared]
- path: infrastructure
depends_on: [domain, shared]
- path: shared
depends_on: []
exclude:
- tests
- docs
# Check
tach check
# Output: ✗ infrastructure/stripe_client.py imports api/routes.py (forbidden)
snakefood, import-graph — alternatives
snakefood is older but rock-solid for very large codebases. import-linter (formerly import-graph) is similar to tach but with a more declarative contract DSL. Pick one and run it in CI.
4. Layering Conventions for Common Python Stacks
| Stack | Typical layers (top → bottom) |
|---|---|
| FastAPI / Flask web app | routes → services → repositories → models / db |
| Django | views → forms / serializers → models → managers / db |
| scikit-learn ML pipeline | notebooks → pipelines → transformers → loaders |
| Airflow / Prefect | dags → tasks → operators / hooks → external clients |
| LLM / RAG application | api → agents → tools → retrieval / embeddings → vector store / LLM client |
| CLI tool (typer / click) | cli → commands → core logic → adapters |
The names matter less than the discipline of having layers and not crossing them. Even a 2-layer split (domain / infrastructure) catches most architectural mistakes.
5. When NOT to Draw One
- Single-file scripts. If your project is one Python file, the dependency graph is a single node. Use a pipeline diagram instead.
- Notebook-driven exploration. Notebooks don't have a stable module structure; the diagram changes every cell. Diagram what the notebook does, not how it's organized.
- Library code with no internal structure. A pure-function utility library with 3 modules and no imports between them doesn't benefit from a diagram.
Common Interview Questions:
Why is a module dependency graph more useful in Python than in Java?
In Java, the package structure already enforces layering (with module declarations in modern Java) and IDEs surface dependencies clearly. Python's flat import system makes layer violations invisible — anyone can from infrastructure.db import session from anywhere. A dependency graph is the only way to catch this without tooling like tach.
How do I refactor out a circular import without breaking everything?
Three options. (1) Extract shared types to a third module both can import. (2) Move the import inside the function body where it's used (delays the import until call time). (3) Use typing.TYPE_CHECKING + string annotations if the cycle is only for type hints. Option 1 is usually the right architectural fix; options 2 and 3 are stopgaps.
Should tests be a separate layer in the dependency graph?
Yes, but tests sit OUTSIDE the layered stack — they import from any layer (api, domain, infrastructure) and use fixtures to provide implementations. The rule is one-directional: production code never imports from tests/. Most diagram tools have an exclude: tests option.
What's the difference between pydeps and tach?
pydeps visualizes — it generates Graphviz output for human inspection. tach enforces — it reads a config file declaring allowed dependencies and fails CI when violated. Use both: pydeps when onboarding or refactoring, tach in CI to prevent regressions.
How do I diagram a monorepo with multiple Python packages?
Two levels of zoom. Level 1: package-to-package dependencies (each box is one installable package). Level 2: module-to-module within each package. Tools like pydeps support both via --cluster. Don't try to fit hundreds of modules into one diagram — readers can't track more than ~15 boxes.
What does a "good" dependency graph look like?
Acyclic (a DAG, no cycles), stratified (organized into 3–5 layers), narrow (each layer has fewer modules than the one above), and boring — most arrows go from one layer down to the next, not skipping layers. If your graph looks like a hairball, you have an architecture problem the diagram is just exposing.