Azure AI Document Intelligence

Azure AI Document Intelligence (formerly Form Recognizer) is Microsoft's document-AI service. It extracts printed and handwritten text, layout (lines, paragraphs, tables, figures), key-value pairs, and semantic fields from forms, invoices, receipts, IDs, and contracts. Unlike a pure OCR engine, it preserves document structure and provides prebuilt models for common document types — the Azure counterpart to AWS Textract.

Model Types:

Read: OCR + text lines, handwriting, language detection. The foundation for every other model.
Layout: Read + paragraphs, reading order, tables, selection marks, figures — produces Markdown output suitable for LLM ingestion.
Prebuilt — Invoices, Receipts, ID Documents, Business Cards, Credit Cards, W-2, 1098/1099, Health Insurance Card, Contracts, Mortgage (1003/1004/1005/1008), Marriage Certificates, Bank Statements, Paystubs, Check. Each returns semantically-typed fields.
Custom Extraction: Train a custom model on 5+ labeled samples for your own form layouts.
Custom Classification: Route mixed document batches to the right extraction model.
General Documents (Generative): Schema-guided extraction from arbitrary document structures using LLM-backed generative field extraction.

Key Features:

Markdown Output: The Layout model returns Markdown with tables as GitHub-flavored tables — perfect for feeding into LLM prompts.
Bounding Polygons & Confidence: Every word, field, and cell carries its polygon and confidence score.
Query Fields: Ask natural-language questions ("What is the invoice total?") and get the span back — similar to Textract Queries.
Add-On Capabilities: High-resolution OCR, formula detection, barcode extraction, language detection, key-value pairs for any document.
Batch API: Submit an Azure Blob container URL and get results to an output container — for bulk processing.
Disconnected Containers: Run Document Intelligence in your own infrastructure for data-residency or air-gapped workloads.

Examples

1. Layout Extraction with Markdown Output (for LLM Ingestion)


from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest, ContentFormat
from azure.core.credentials import AzureKeyCredential

client = DocumentIntelligenceClient(
    endpoint="https://myco-docintel.cognitiveservices.azure.com/",
    credential=AzureKeyCredential(""),
)

with open("contract.pdf", "rb") as f:
    poller = client.begin_analyze_document(
        "prebuilt-layout",
        AnalyzeDocumentRequest(bytes_source=f.read()),
        output_content_format=ContentFormat.MARKDOWN,
    )

result = poller.result()
print(result.content[:2000])   # Markdown with headings, tables, etc.

2. Extract Invoice Fields with the Prebuilt Model


poller = client.begin_analyze_document(
    "prebuilt-invoice",
    AnalyzeDocumentRequest(url_source="https://myco.blob.core.windows.net/inv/inv-0421.pdf"),
)
invoice = poller.result().documents[0]

print("Vendor:", invoice.fields["VendorName"].value_string)
print("Total :", invoice.fields["InvoiceTotal"].value_currency.amount)

for item in invoice.fields.get("Items", {}).value_array or []:
    desc = item.value_object["Description"].value_string
    amt  = item.value_object["Amount"].value_currency.amount
    print(f"  - {desc}: {amt}")

3. Query Fields (Natural-Language Field Extraction)


poller = client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(url_source="https://myco.blob.core.windows.net/claims/claim-88.pdf"),
    features=["queryFields"],
    query_fields=["PolicyNumber", "DateOfIncident", "ClaimedAmount"],
)
result = poller.result()
for doc in result.documents:
    for name, field in doc.fields.items():
        print(name, "->", field.content)

4. Pair with Azure OpenAI for RAG Ingestion

The typical pattern for document-heavy RAG: Document Intelligence → chunks of Markdown → Azure OpenAI embeddings → Azure AI Search index.


# 1) Parse PDF to Markdown via Document Intelligence Layout
md = client.begin_analyze_document(
    "prebuilt-layout",
    AnalyzeDocumentRequest(bytes_source=pdf_bytes),
    output_content_format=ContentFormat.MARKDOWN,
).result().content

# 2) Chunk by headings / token count (use langchain MarkdownHeaderTextSplitter, etc.)
chunks = chunk_markdown(md, max_tokens=500)

# 3) Embed chunks with Azure OpenAI
from openai import AzureOpenAI
aoai = AzureOpenAI(azure_endpoint="...", api_key="...", api_version="2024-10-21")
vectors = aoai.embeddings.create(model="embedding-3-large-prod", input=chunks).data

# 4) Upload to Azure AI Search (see the AI Search page for index schema)
documents = [{"id": f"doc-{i}", "content": c, "content_vector": v.embedding}
             for i, (c, v) in enumerate(zip(chunks, vectors))]
search_client.upload_documents(documents)

When to Choose Document Intelligence:

You need structured fields from invoices, receipts, IDs, tax forms, contracts, or custom forms.
You're building a RAG pipeline over PDFs and want high-quality Markdown with preserved tables.
You need bounding boxes and confidence scores for downstream validation or human-in-the-loop review.
Compared to Azure AI Vision OCR: Document Intelligence understands document structure (forms, tables, reading order); use AI Vision OCR only for unstructured text in images.