Flask remains one of the most widely deployed Python web frameworks in 2026, despite the rise of FastAPI, Starlette, Litestar, and the continued dominance of Django for large monoliths. Its staying power is not accidental: Flask fills a specific niche — a small, well-understood, sync-by-default WSGI framework that gets out of the way and lets you wire your own stack. For AI/ML engineers in particular, it is still the default for model-serving microservices, internal tools, and anything that needs to be stood up quickly and run reliably for years without rewrite churn.
This page is an honest assessment: where Flask earns its place, where it does not, and the concrete patterns that show up in production ML and data systems.
Flask's installed footprint is roughly a few hundred KB for the core package (plus its transitive dependencies — Werkzeug, Jinja2, Click, itsdangerous, MarkupSafe). There is no ORM, no form library, no auth, no admin panel, no migrations, no background task runner bundled in. That is the point.
What Flask does not include is load-bearing:
psycopg, or nothing at all. For an ML service that talks only to a feature
store and a model file, no ORM is frequently the correct answer.attrs, or hand-written validators. Payloads for a /predict
endpoint rarely benefit from a full form framework.The absence of these means upgrades are smaller, the dependency tree is smaller, the attack surface is smaller, and the time-to-understand-the-codebase for a new engineer is shorter. For a service whose job is "load a model, accept JSON, return JSON", this matters.
Flask imposes almost no structure. There is no recommended project layout, no convention for where models live, no preferred test runner, no blessed database. This is alternately described as "flexibility" and "a footgun", and both are true.
Contrast with Django:
| Concern | Flask | Django |
|---|---|---|
| ORM | Bring your own | Built-in Django ORM |
| Admin UI | Optional (Flask-Admin) | Built-in, generated from models |
| Auth | Extension or roll-your-own | Built-in auth app |
| Migrations | Flask-Migrate (Alembic) | Built-in makemigrations |
| Project layout | Anything goes | Prescriptive (apps, settings, URLs) |
| Templating | Jinja2 (swappable) | Django templates (swappable) |
Django's choices pay off when you are building a content-heavy app with users, roles, forms, and a CMS-ish surface. They are overhead when you are building a stateless inference microservice. Pick the framework that matches the shape of the problem, not the one that matches the framework you used last time.
Flask's extension ecosystem is mature. The following are the extensions that come up most often in production. Maintenance status reflects observable activity on GitHub and PyPI as of early 2026; verify before adopting.
| Extension | Purpose | Maintenance |
|---|---|---|
Flask-SQLAlchemy | SQLAlchemy integration, session scoping, declarative base | Active (Pallets) |
Flask-Migrate | Alembic migrations wired into the Flask CLI | Active |
Flask-Login | Session-based user auth, login_required decorator | Active |
Flask-JWT-Extended | JWT issuance/verification, refresh tokens, cookie or header mode | Active |
Flask-Smorest | OpenAPI/Swagger generation with marshmallow schemas | Active |
Flask-CORS | Cross-origin headers with per-route rules | Active |
Flask-Limiter | Rate limiting with Redis/Memcached backends | Active |
Flask-Caching | Response and function-level caching (Redis, Memcached, filesystem) | Active |
Flask-SocketIO | WebSocket support via python-socketio; needs gevent/eventlet | Active, but see Section 6 |
Flask-Admin | Auto-generated admin UI over SQLAlchemy/MongoEngine models | Maintenance-mode; evaluate carefully |
The pattern to watch for: an extension that hasn't seen a release in 18+ months against a framework that ships on a ~12-month cadence. Pin versions, check the Werkzeug compatibility matrix, and don't adopt a dormant extension for new work if a thin hand-written alternative is plausible.
The most common production use. A typical service loads a pickled scikit-learn
pipeline, an XGBoost booster, or a TorchScript model at startup, exposes a
POST /predict endpoint, and is deployed behind gunicorn on Kubernetes. For
tree models and small neural nets where the forward pass dominates, Flask's
per-request overhead is in the noise.
from flask import Flask, request, jsonify
import joblib
import numpy as np
app = Flask(__name__)
# Load once at startup; shared across requests within a worker.
MODEL = joblib.load("/srv/models/churn_xgb_v7.joblib")
FEATURES = [
"tenure_months", "monthly_charges", "total_charges",
"contract_month_to_month", "has_fiber",
]
@app.post("/predict")
def predict():
payload = request.get_json(force=True)
try:
x = np.array([[payload[f] for f in FEATURES]], dtype=np.float32)
except KeyError as e:
return jsonify(error=f"missing feature: {e.args[0]}"), 400
proba = float(MODEL.predict_proba(x)[0, 1])
return jsonify(
churn_probability=proba,
model_version="churn_xgb_v7",
)
@app.get("/healthz")
def health():
return "ok", 200
Flask + Jinja2 + a reverse proxy is still a reasonable way to ship a small internal dashboard — a feature-store inspector, a label-queue UI, a data-quality report viewer. No SPA build pipeline, no frontend framework, server-rendered HTML. For a tool used by five engineers on the data team, this is fast to build and cheap to maintain.
Flask handles the "receive a POST from Stripe / GitHub / a third-party SaaS, validate the signature, enqueue a job, return 200" pattern cleanly. Combine with RQ or Celery for the async side. Total code is usually under 100 lines.
A thin Flask service that authenticates the frontend, fans out to 3–5 internal services, stitches and reshapes the responses, and returns a frontend-friendly JSON. The sync model is fine here when the downstream calls can be batched or parallelized with a thread pool.
Up to roughly 50 endpoints over a handful of domain models, Flask with Flask-SQLAlchemy, Flask-Migrate, and Flask-Smorest for OpenAPI is a perfectly good choice. Beyond that scale, the un-opinionatedness starts to cost more than it saves.
A notebook becomes a module, the module gets wrapped in a Flask route, the route gets deployed. This is the archetypal path from experiment to production for a data scientist without a platform engineer nearby. It is not always the final form of the service — but it is often a correct intermediate form.
Honest comparisons:
stream_with_context, but ASGI frameworks handle SSE and
chunked responses more ergonomically.None of these make Flask "bad." They mean the problem shape has moved — pick accordingly.
Flask is synchronous by default, WSGI, and GIL-bound. Each request
occupies a worker for its full duration. There is async support via async def
view functions since Flask 2.0, but it runs on top of a sync framework — it does not
give you true ASGI concurrency without an adapter like asgiref.
In production, the deployment choice matters more than Flask itself:
Typical p99 latency for a pure-Python prediction endpoint (no ML, just JSON parse/validate/respond) on commodity hardware is in the 5–30 ms range. For an ML endpoint, latency is dominated by the model's forward pass: a gradient-boosted tree (~100 trees, ~10 features) runs in ~1–3 ms; a small sklearn pipeline in ~3–10 ms; a transformer on CPU in tens to hundreds of ms. Flask's contribution is usually under 5 ms of that. If your p99 is 200 ms, look at the model, not the framework.
flask run --debug reloads on source
changes. Fine for dev; do not ship.app.test_client() runs the WSGI
app in-process, no network. Fast, deterministic, and the test code reads like the real
thing.import json
import pytest
from myapp import app
@pytest.fixture
def client():
app.config["TESTING"] = True
with app.test_client() as c:
yield c
def test_predict_happy_path(client):
payload = {
"tenure_months": 24, "monthly_charges": 79.5,
"total_charges": 1908.0, "contract_month_to_month": 0,
"has_fiber": 1,
}
resp = client.post("/predict", json=payload)
assert resp.status_code == 200
body = resp.get_json()
assert 0.0 <= body["churn_probability"] <= 1.0
assert body["model_version"] == "churn_xgb_v7"
def test_predict_missing_feature(client):
resp = client.post("/predict", json={"tenure_months": 24})
assert resp.status_code == 400
assert "missing feature" in resp.get_json()["error"]
def test_healthz(client):
assert client.get("/healthz").status_code == 200
A minimal requirements.txt for the service above:
flask==3.0.*
gunicorn==22.*
joblib==1.4.*
numpy==2.*
xgboost==2.*
scikit-learn==1.5.*
# dev only
pytest==8.*
And a typical production start command:
gunicorn \
--workers 4 \
--worker-class sync \
--bind 0.0.0.0:8080 \
--timeout 30 \
--access-logfile - \
--error-logfile - \
myapp:app
Flask is governed by the Pallets Projects, a small collective that also maintains Werkzeug, Jinja2, Click, and itsdangerous. Governance is informal but consistent; releases are predictable; breaking changes are telegraphed well in advance.
For a team choosing a framework today, the stability argument cuts both ways: Flask will not surprise you, but it will also not give you the latest ASGI/async features without effort. For ML-serving, internal tools, and webhook receivers, that trade is almost always the right one. For a greenfield high-concurrency API with WebSockets and streaming responses, pick FastAPI or Litestar instead, and don't feel bad about it.