Baseline Microsoft Foundry Chat on Azure — Architectures — startupengineering.io

THE SOURCE

Source write-up

Purpose

Microsoft's enterprise baseline (last reviewed April 2026) for a single, prompt-based agent chat on Microsoft Foundry: Azure App Service web app behind Application Gateway + WAF, agent runtime in Foundry Agent Service, RAG via Azure AI Search, conversation state in Cosmos DB, full VNet isolation with Private Link and Azure Firewall egress. Independent evaluation for startup fit.

Components

Azure App Service
Application Gateway
Web Application Firewall
Microsoft Foundry
Foundry Agent Service
Microsoft Agent Framework
Azure OpenAI
Azure AI Search
Azure Cosmos DB for NoSQL
Azure Storage
Azure Virtual Network
Azure Private Link
Azure Firewall
Azure Key Vault
Microsoft Entra ID
Application Insights
Azure Monitor
Azure Bastion

Vendor's stated assumptions

Single-region, single prompt-based agent — multi-region DR and multi-agent topologies are explicitly out of scope.
Network perimeter is the primary control: every PaaS dependency reachable only via Private Link, single egress via Azure Firewall, single ingress via App Gateway + WAF.
Agent runtime in Foundry Agent Service; conversation state is a managed Cosmos DB enterprise_memory primitive, not a build-it-yourself concern.

Show full source write-up

What this artefact evaluates

Microsoft published the Baseline Microsoft Foundry Chat Reference Architecture (last reviewed April 7, 2026, on the 180-day update cycle). It is the network-hardened production cousin of the Basic Foundry chat reference — a single, prompt-based agent built on Microsoft Foundry, hosted alongside an App Service chat front end, with the entire data-plane locked inside a private VNet behind Application Gateway + WAF on the way in and Azure Firewall on the way out. This artefact evaluates the architectural choices and trade-offs, not the maturity of Foundry Agent Service or the behaviour of any specific Azure OpenAI model. Pricing tiers and service quotas are out of scope.

The reading is from the perspective of a 5–25-engineer team picking a chat-on-LLM foundation today. That perspective matters: this reference is calibrated for an enterprise architect with a compliance checklist, and the gap between what it prescribes and what a startup actually needs in month one is wider here than it is for either the AWS or GCP equivalents.

Reference at a glance

The architecture is structured around six concerns layered behind a single internet-facing edge. Reading the diagram in two passes — once for what runs where (compute), once for how traffic travels (network) — makes the rest of the reference far easier to follow.

Concern	Primary responsibility	Microsoft's pick	Replaceable with
Edge	TLS, WAF, DDoS, path-based routing, AZ distribution	Application Gateway + WAF + DDoS Protection	Front Door (multi-region) — explicitly missing from this baseline
Frontend	Chat UI, session, identity	Azure App Service (Web Apps), zone-redundant; Entra ID	Container Apps, AKS, or any HTTPS frontend
Orchestration	Agent runtime, tool calling, content safety	Foundry Agent Service (Standard agent setup)	Hosted agents on Foundry, custom Foundry-protocol code, Semantic Kernel, LangChain
Inference	Foundation-model serving	Azure OpenAI via Foundry; data-zone provisioned + standard with spillover	Other Foundry-published models, optionally fronted by API Management gateway
Data	Conversation state, file storage, RAG index	Cosmos DB (enterprise_memory) + Storage + AI Search	Cosmos as grounding store, Azure SQL DB, external vector stores
Network & egress	Private endpoints for every PaaS, single egress through firewall	VNet + Private Link + Private DNS + Azure Firewall	Hub-spoke or shared-services networks adopted at the org level

The named pattern is single-agent, prompt-based, RAG-capable chat with persisted conversation state. Multi-agent topologies — the default in the GCP equivalent — are listed only as alternatives. Hosted (containerised, deterministic) agents are also an alternative to the prompt-based agent that the baseline picks.

What Microsoft actually proposes

The reference is built around six layers wrapped in a private network. Each layer is opinionated about how it integrates, even when it is silent on what is inside it.

Edge. A single internet-exposed surface — Application Gateway with the integrated Web Application Firewall, DDoS Protection enabled, distributed across availability zones. This is the only resource in the architecture with a public IP that user traffic hits. Path-based routing, TLS termination, and certificate management (out of Key Vault) all live here.
Frontend. Azure App Service hosts the chat UI under the Baseline highly available zone-redundant App Service web app pattern. The web app authenticates users with Microsoft Entra ID, invokes the agent through the Microsoft Agent Framework SDK, and reaches Foundry over a private endpoint using its own managed identity. Importantly, the reference treats the chat front end as a separate concern — it links out to a different baseline rather than re-prescribing the App Service hardening here.
Orchestration. Foundry Agent Service in Standard setup (bring your own network) runs the prompt-based agent. The agent has a system prompt, a configured language model, and a tool list: the AI Search tool for grounding, the Web Search tool for live data, optional MCP servers, optional Agent2Agent endpoints, and custom OpenAPI tool connections. Foundry Agent Service enforces content safety inline; it does not provide load balancing, failover, or circuit breaking — the architecture compensates through configuration discipline rather than platform features.
Inference. Azure OpenAI models are served through Foundry. The recommended deployment shape is a data-zone provisioned primary with a data-zone standard spillover; global deployments are positioned as the maximum-resilience option. Optional Azure API Management can front multiple model deployments as a custom gateway. The choice between GPT-class SKUs is intentionally left to the workload.
Data. Three managed services carry distinct data:
- Cosmos DB for NoSQL holds the enterprise_memory database — agent definitions, conversations, and per-conversation items (messages, tool calls, tool outputs). This is the most consequential default in the entire reference: conversation state is not a build-it-yourself concern, it is a Foundry primitive backed by a Microsoft-managed schema in a customer-owned Cosmos account.
- Azure Storage holds files uploaded during sessions, plus a separate account for the App Service deployment ZIP.
- Azure AI Search indexes uploaded files and any static knowledge for the File Search tool, with vector embeddings and semantic ranking.
Networking. This is where the reference earns its enterprise positioning. Every PaaS dependency — Cosmos, Storage, AI Search, Foundry Agent Service, App Service, Key Vault — is reachable only via Private Link. Private DNS zones linked to the VNet resolve those endpoints. The VNet is sliced into purpose-specific subnets (App Service integration, Private Endpoint, Foundry integration, AI Agent integration, Bastion, jump box, build agents, Azure Firewall). Egress for everything inside the VNet — including agent tool calls (web search, custom APIs, MCP, A2A) — funnels through Azure Firewall, which enforces FQDN-based egress rules, runs across all AZs, and uses multiple public IPs to avoid SNAT exhaustion.
Identity. Managed identities are the default everywhere. The reference is unusually careful: distinct identities per Foundry account, per project, per web app, per orchestrator; workload identities for apps and Microsoft Entra agent IDs for AI agents; portal access in production restricted to read-or-troubleshoot only because portal actions can run as the service identity and inadvertently expose agent or chat data.

The data-flow itself is short and linear (the seven numbered steps in the published diagram are walked through in the carousel below): user → App Gateway/WAF → App Service → agent (over private endpoint, authenticated via managed identity) → tools (AI Search inside the VNet; web/custom outside via Firewall egress) → language model → conversation state persisted to Cosmos DB.

Carousel

How this differs from the GCP and AWS equivalents

The instinct to read this reference as if it were the same shape as the GCP multi-agent reference is wrong, and the differences are worth naming up front.

Pattern. The GCP reference defaults to a coordinator + subagents multi-agent topology with named sequential and iterative-refinement patterns. This Azure reference defaults to one prompt-based agent, with multi-agent listed as an alternative shape. If your problem genuinely requires decomposition across specialised agents, the GCP shape is a closer fit; if your problem is a single-agent chat with RAG and a handful of tools, this one is.
Network posture. The Azure reference is the most network-opinionated of the three: a private VNet, private endpoints for every PaaS, single egress through a stateful firewall. The GCP reference uses VPC Service Controls + IAM but does not require a private network for the agent layer. The AWS agentic foundations reference is closer in spirit to GCP — VPC posture is recommended, not load-bearing.
Conversation state. Microsoft hands you a managed conversation primitive (Cosmos DB enterprise_memory). GCP and AWS leave conversation persistence to the workload. This is a rare instance where Azure is the simpler default.
Identity. Microsoft's identity story (managed identities per Foundry account, project, web app, orchestrator; agent IDs in Entra ID; portal access gated separately) is the most disciplined of the three published references. The trade-off is more primitives to reason about on day one.
Single region. The Azure reference is explicit that multi-region, global ingress, DNS failover, and cross-region replication are out of scope. The GCP reference is silent on multi-region at the architecture level. Read the gap honestly: out of scope and covered are different from implicitly global.

Findings

1. The network perimeter is the load-bearing architectural choice. The most consequential thing this reference prescribes is not the agent shape or the model SKU; it is that the entire data-plane sits inside a private VNet and the agent's outbound tool calls go through a stateful firewall. That single choice cascades — private endpoints for every dependency, private DNS zones, purpose-specific subnets, the App Service VNet integration. Adopt this and you inherit a defensible production posture; skip it and you have built the Basic Foundry chat, which is the explicitly-named lighter baseline.

2. Conversation state is a managed primitive, not a workload concern. Foundry Agent Service writes conversations, messages, tool calls, and tool outputs into a Cosmos DB schema it owns. Most chat references treat conversation persistence as a workload-specific design decision; this one removes the decision. Pragmatically, that is a good default — the schema is Microsoft-owned and read-only at the data-plane layer, so teams do not invent their own message-store half-baked. The trade-off is lock-in: the conversation model is no longer portable across cloud providers.

Implication: if portability across clouds matters, treat the conversation persistence as a Foundry-specific feature you would otherwise have to rebuild on migration.

3. The Foundry Agent Service has no native HA or DR. Microsoft is unusually candid here. The service does not provide load balancing, failover, or circuit breaking; recovery from a regional incident is via reconstruction. The compensating controls are deliberate — Cosmos DB continuous backup with PITR, GZRS storage with customer-managed failover, agents-as-code in source control, role-preserving user-assigned managed identities, delete locks on dependency services, an external source-of-truth for AI Search content because the service has no built-in restore. This list reads as a startup-of-disasters: useful, but it tells you exactly how much operational work the platform is not doing.

4. The conversation isolation footnote is the most important sentence in the reference. Foundry does not enforce per-user authorization on conversation IDs at the data plane. The web app's project-level credentials can read any conversation in the project. The reference correctly cites OWASP API1: Broken Object Level Authorization and tells the developer to validate the calling user against the conversation ID server-side. Most published chat architectures hand-wave this away. Reading and implementing this section is the difference between a working app and a class-action breach.

5. The reference is a baseline, not a recipe. The chat UI hardening is delegated to Baseline highly available zone-redundant App Service web app. Cost is delegated to the Well-Architected Framework. Several pillars (cost optimisation, performance, operational excellence) are present in the published reference but are short on specifics relative to security and reliability. The reader is expected to compose this with at least two adjacent references; if they do not, the security and reliability prescriptions will land on a flimsy foundation.

6. The infrastructure floor is high. A literal implementation of the baseline brings up — at minimum — Application Gateway, WAF, DDoS, Azure Firewall, Bastion, a jump box, two or more Key Vaults (one for App Gateway TLS, a dedicated one for Foundry connection secrets), an isolated Cosmos account, an isolated AI Search instance, an isolated Storage account, and the Foundry account and project. None of this is optional in the published architecture. Realistic monthly baseline cost before any traffic is measured in low single-digit thousands of US dollars; the reference does not quantify it.

Condition: this is sized for the workloads the reference targets — enterprise chat applications. Startups should choose the Basic Foundry chat first and graduate to this one when the compliance and reliability requirements actually exist.

7. The portal is treated as a privileged surface. Most published references say nothing about web-portal access. This one says: portal actions can run as the service identity, agents and chats can leak through the portal to underprivileged users, project creation through the portal bypasses your network controls, manage projects/agents through IaC pipelines instead. This is a small section in the reference but a load-bearing one for production operations.

8. Multi-agent is one notebook entry away from being a redesign. The published alternatives section lists multi-agent patterns — sequential, concurrent, group chat, magentic, handoff — and presents them as substitutable. They are not. Switching from the prompt-based single-agent shape to a coordinator + subagents topology rewires the data-flow (multiple model calls per request, tool routing decisions, evaluator/refiner subagents) and the operational story (per-subagent cost, per-hop latency, trajectory evaluation). Adopting Foundry's single-agent baseline today and moving to multi-agent later is feasible, but it is not free.

Design considerations through a startup lens

The published reference includes its own Design considerations section organised by Well-Architected pillars. The intended reader is an enterprise architect; the notes below recast each pillar for a 5–25-engineer team that wants to know which prescriptions hold up at small scale and which over-spend.

Security

The security posture is the strongest part of the reference and the part a startup should copy with the least modification. The network perimeter, managed-identity discipline, project-scoped connections, and explicit mention of OWASP API1 are exactly what a production chat application needs and exactly what a startup typically forgets to build.

Two practical caveats:

Server-side conversation authorisation is not optional. Implement the OWASP API1 check on the very first commit. The cost of bolting it on later is non-trivial, and the incident cost of not having it is unbounded.
Two API-key connections sneak in. Application Insights and the Web Search tool are configured with API keys, not Entra identities. Rotate them on a schedule. Store them in a Foundry- scoped Key Vault, not the App Gateway Key Vault.

Reliability

The published prescriptions — AZ-redundant App Service, ZRS+ storage, ≥3 AI Search replicas, multi-AZ Azure Firewall, dual model deployments with spillover — are concrete and correct. The honest gap: this is a single-region baseline, and the reference says so. A startup at 5 engineers should not pretend they will operate multi-region until they have a real customer who needs it. When that day arrives, the work is non-trivial — global ingress, DNS failover, cross-region replication, active/active or active/passive designation, and regional failover/failback are all unsolved here.

For DR specifically, the reference's compensating controls (continuous backup, agents-as-code, locked dependency services, preserved managed identities) are the right shape. Implement them. They are also more operational discipline than platform feature, which is fair to call out.

Operations and observability

The reference uses Azure Monitor + Application Insights. Both work, both are sufficient for compliance, neither is a great surface for agent trajectory inspection. The practitioner pattern that holds up at startup scale is the same as on AWS or GCP:

Azure Monitor for the structural trace (request IDs, latencies, WAF/Firewall verdicts, IAM decisions).
A dedicated agent observability tool (Langfuse, Phoenix, Braintrust, LangSmith) for the prompt-level surface — system prompt, tool calls, tool outputs, model completions. Run it side-by-side; do not try to make Application Insights carry this job.
A small offline eval set replayed on every model upgrade and every prompt change.

Cost

The reference is light on cost specifics, which is the gap a startup adopter should care about most.

Per-conversation cost is the only cost number that matters in production. Token spend, AI Search QU-hours, Cosmos RU/s, and Firewall data-processing all need to roll up to a cost per resolved chat. Wire that into the same dashboard as latency.
The infrastructure baseline is the largest fixed cost in month one. Several of the components above scale with traffic (Cosmos RU/s, AI Search replicas, App Service instances) but several do not — Azure Firewall, App Gateway, Bastion, jump box, Private Endpoints, DDoS Protection (Standard) all bill whether or not anyone is talking to the chat. Right-size the baseline before optimising the variable.

Performance

Latency in this architecture is dominated by two terms — model inference and the App Gateway/Firewall round trips on egress. A working budget per layer at p95 looks like:

Layer	Realistic p95 budget	Notes
App Gateway + WAF	30–80 ms	TLS + WAF inspection.
App Service	50–150 ms	Authn + framework overhead.
Foundry → tools (AI Search)	80–250 ms	Private endpoint, in-VNet.
Foundry → tools (egress)	200–600 ms	App Service → Foundry → Firewall → external API.
Model call (GPT-class)	800 ms – 4 s	Dominant. Provisioned throughput is the lever.
Cosmos persistence	20–80 ms	Single-region, point write.
Total p95	1.5–5 s	For RAG chat without web-search hops.

These are working numbers, not Microsoft SLOs. Treat them as the default expectation for a "fast chat" UX in this architecture.

Deployment substrate trade-offs

The baseline uses Azure App Service for the chat front end; the agent runtime is Foundry Agent Service in Standard setup. Both choices are good defaults. Two trade-offs worth being deliberate about:

Choice	Best for	Watch out for
App Service for the chat UI	Iteration speed; managed TLS; AZ redundancy.	Cold-start tax on plans below P1v3.
Container Apps / AKS for the chat UI	Container-native shops; sidecar telemetry.	More operational surface than App Service; the published baseline does not cover the hardening.
Foundry Agent Service (Standard) — prompt-based	First production deployment; managed runtime.	No HA, no failover, no circuit breaking — your gateway has to compensate.
Foundry Agent Service — hosted (containerised)	Deterministic agents; tighter control of code.	More to operate; less of the reference applies as-drawn.
Custom Foundry-protocol code (Semantic Kernel / LangChain)	Multi-cloud aspirations; existing investment in those frameworks.	Loses the inline content-safety and the managed conversation primitive.

Conditions of applicability

Context	Fit	Note
Enterprise chat on Azure, ≥10 eng	High	The reference is built for exactly this case.
Regulated domain (healthcare, finance, public sector)	High	Network perimeter, identity discipline, BOLA call-out match audit needs.
Pre-PMF startup, <5 eng	Low	Use Basic Foundry chat first; this baseline is enterprise-shaped.
Single-agent RAG chat over enterprise data	High	Pattern matches; Foundry's grounding tools are the right primitives.
Multi-agent / coordinator-plus-subagents	Low–Medium	Listed as alternative; redesign required, GCP reference fits better.
Multi-region / global ingress requirement	Low	Out of scope; teams must compose their own.
Latency-critical (<500ms p95)	Low	Model inference + WAF + Firewall egress is hard to fit inside 500 ms.
Non-Azure team (AWS, GCP, self-hosted)	Low	Foundry primitives do not transfer; only the patterns do.
High-volume batch chat workload	Medium	Provisioned throughput + Cosmos sizing make this feasible; cost discipline is non-negotiable.

What the architecture does not address

A trajectory- or outcome-level evaluation harness for the agent.
Per-conversation cost as a first-class observability signal.
Multi-region active/active or active/passive deployment, global ingress, DNS failover, and cross-region replication.
The chat front end's own hardening (delegated to a different baseline reference).
Cost optimisation specifics (delegated to the Well-Architected Framework).
A canary or shadow-deployment story for system-prompt or tool changes — the most frequent change a chat team ships.
Migration path off Foundry primitives if the workload outgrows them.
Multi-tenant chat isolation beyond per-conversation IDs (you are expected to enforce tenant boundaries at the application layer).

These are scope boundaries of the published reference, not failures of it. They are the work the reader still has to do after adopting it.

Practitioner adoption checklist

A working order of operations for a team adopting this architecture in the first 90 days. Not a rigid sequence, but the checklist that keeps the early decisions reversible.

Start with the Basic Foundry chat reference, not this one. Get a single agent talking to a single Azure OpenAI deployment over a public endpoint with managed identities. Confirm the contract, not the network.
Add the App Service front end second, following the Baseline highly available zone-redundant App Service pattern. Wire Entra ID authentication on day one.
Implement the OWASP API1 conversation-authorisation check before shipping a second user. Server-side validation that the calling user owns the conversation ID is non-negotiable.
Stand up the VNet third. Move every PaaS dependency behind a private endpoint. Verify the agent still reaches AI Search, Cosmos, Storage, Foundry — all without leaving the VNet.
Add Application Gateway + WAF + DDoS as the only internet-exposed surface. Migrate the App Service ingress behind it.
Add Azure Firewall as the single egress chokepoint. FQDN-based egress rules. Multi-AZ. Multiple public IPs.
Wire managed identities everywhere. Distinct identities per Foundry account, per project, per web app. No API keys except the two the reference acknowledges.
Move project creation to IaC. Disable portal-based project creation in production. Manage agents-as-code.
Add a dedicated agent observability tool (Langfuse, Phoenix, Braintrust). Run it alongside Application Insights.
Plan the DR rehearsal. Continuous backup, agents-as-code, locked dependency services, preserved managed identities, and a documented restore order. Run a tabletop exercise once.

Author's take (Selva, April 2026)

If I were a 10-engineer Azure-native team building an enterprise chat product today, I would adopt this architecture as drawn for the security and reliability layers, and I would push hard against the implication that the network perimeter is optional. The managed-identity discipline, the project-scoped connections, the explicit OWASP API1 reference, and the inline content safety through Foundry are the pieces I would not negotiate. The single-agent default is the right starting point for a chat product even though the GCP reference's multi-agent shape is more exciting.

What I would not adopt as drawn is the day-one infrastructure baseline. App Gateway + WAF + DDoS + Azure Firewall + Bastion + jump box + multiple Key Vaults + per-project Cosmos/Storage/AI Search are correct for a regulated enterprise; they are too much fixed cost for a 10-engineer startup before the first paying customer. I would start from the Basic Foundry chat, ship to production, then graduate every module of this baseline as compliance pressure or traffic justifies it.

I would also be honest about the lock-in. Foundry's conversation primitive (enterprise_memory in Cosmos) is genuinely useful, but it is a Microsoft-owned schema. If portability across clouds is on the roadmap, treat it as a feature you would have to rebuild on migration, not as portable infrastructure.

This is one practitioner's reading. It is not a universal recommendation.

Open questions for re-evaluation

How does the Basic → Baseline graduation actually unfold in practice? Are teams composing their way through, or rebuilding?
What is the realistic p95 latency of an Application Gateway → App Service → Foundry → external tool → model round trip?
How does Foundry's single-agent shape hold up against the same workload built on the GCP coordinator + subagents reference at 10× volume?
When Foundry Agent Service ships native HA/DR primitives, which of the compensating controls become unnecessary?
How does the cost floor of this baseline compare to the AWS agentic foundations and the GCP multi-agent reference at a realistic 25K conversations/month?
What does it take to run this architecture multi-region without re-architecting it?

Re-evaluation cadence: 6 months, or sooner on a major Microsoft Foundry revision.

View Microsoft baseline architecture

MY EVALUATION

Verdict

Poor fit. The network and identity discipline is genuinely excellent — if you need it. For a pre-PMF startup the infrastructure baseline (App Gateway + WAF + DDoS + Firewall + Bastion + multiple Key Vaults) is $1–3K/month before any traffic, and single-region + single-agent caps the ceiling.

Rubric scores

Conceptual fit (chat/RAG)4/5

Operational complexity1/5

Cost transparency2/5

Lock-in / portability2/5

Conditions for adoption

Adopt fully when: regulated/enterprise Azure shop, ≥20 engineers, security-first product, and single-region is acceptable.
Adopt selectively when: Azure-aligned but pre-PMF — keep Foundry + AI Search + managed identities; defer Bastion, jump box, and per-project Key Vault isolation until they pay for themselves.
Substitute when: not on Azure, or a sub-10 engineer team without a regulatory mandate. The conceptual layers transfer; the perimeter does not.
Skip when: multi-agent or multi-region is a launch requirement — both are out of scope here.

What to keep

Network perimeter is the headline — every PaaS dependency reachable only via private endpoints, single egress through Azure Firewall, single internet-exposed surface (Application Gateway + WAF).
Identity story is unusually disciplined — managed identities everywhere, distinct identities per Foundry account / project / web app, agents-as-code, project-creation gated to IaC.
Reliability prescriptions are concrete: AZ-redundant App Service, ZRS+ storage, ≥3 AI Search replicas, multi-AZ Azure Firewall, dual model deployments with spillover.
Calls out the Broken Object Level Authorization risk on conversations explicitly — most published references skip this.
Conversation persistence is a managed primitive (Cosmos DB enterprise_memory) rather than a build-it-yourself concern.

Where it costs more than expected

Single-region by design; multi-region is explicitly out of scope, so global-ingress and DR strategies are left to the reader.
Pattern is single-agent — multi-agent (sequential, concurrent, group chat, magentic, handoff) is mentioned only as an alternative; teams that need it will redesign substantially.
The infrastructure baseline (App Gateway + WAF + DDoS + Firewall + Bastion + jump box + multiple Key Vaults + per-project AI Search/Storage/Cosmos isolation) is enterprise-shaped — easily $1–3K/month before any traffic.
Foundry Agent Service has no built-in load-balancing, failover, or DR — recovery is via reconstruction; the reference compensates with operational discipline rather than platform features.
Cost optimisation, performance, and operational excellence pillars in the published reference are short on specifics relative to security and reliability.

Conflict of interest: none.