The problem the industry finally stopped ignoring
For the first two years of the production LLM cycle, every foundation-model provider shipped its own incompatible way to wire models into tools. Each vendor had its own function-calling format, its own tool-use schema, its own SDK conventions, and its own subtle assumptions about how state, retries, and streaming worked. The cloud platforms layered those primitives in slightly different ways. The result was predictable: enterprise teams ended up with agent frameworks glued together by bespoke adapters, and every model migration meant rewriting a meaningful share of application code.
That era is closing. A small set of open protocols and emerging runtime conventions have converged on a common mental model: a model talks to tools through a well-defined, transport-agnostic protocol; tools can live anywhere; and runtimes are increasingly interchangeable. The specifications are not yet fully identical across the ecosystem, but they are interoperable in ways that matter, and the community tooling is treating them as complementary layers in a stack rather than competing alternatives.
The framework that follows walks through what the Model Context Protocol actually is and why it became the convergence point, what modern agent-runtime APIs add on top, where agent-to-agent protocols fit, the architectural moves enterprise teams should commit to, the security model that distinguishes a defensible deployment from a vulnerable one, the regulatory dimensions that 2026 has added to agent architecture, and the 90-day pilot plan that builds the muscle memory before the customer-facing deployment.
What the Model Context Protocol actually is
The Model Context Protocol (MCP) was published as an open specification in late 2024 and has since become the de-facto convergence point for how LLMs consume tools and data. The protocol defines three primitives an LLM can interact with: tools (actions the model can invoke), resources (structured data the model can read), and prompts (reusable templates). Communication runs over a JSON-RPC transport layer that supports stdio, HTTP, and server-sent events.
The structurally important design choice is that MCP servers are independent from the model. A server that exposes a Jira tool, a PostgreSQL resource, a Slack workflow, or a custom internal API can be connected to any LLM client that speaks the protocol. By Q1 2026 the public MCP registry lists hundreds of servers covering databases, filesystems, SaaS APIs, observability tools, version-control systems, ticketing platforms, and internal enterprise systems. The upshot for architecture: the tool surface becomes a portable asset rather than a model-specific integration.
The protocol is open and community-maintained. Multiple commercial AI vendors and open-source projects have shipped first-class MCP support; the specification itself lives at modelcontextprotocol.io and is governed independently of any single vendor. For enterprise teams, this is the key property: writing tools against MCP rather than against any single vendor's function-calling format means the tools survive the next model migration without rewriting.
What modern agent-runtime APIs add
On top of the tool-protocol layer, modern agent-runtime APIs handle the higher-level concerns of multi-turn tool loops, built-in capabilities (web search, file search, computer use, code execution), reasoning-state continuity across model calls, streaming, and observability. These runtime APIs have become materially thinner than the application code teams used to write to handle the same concerns in 2023–2024.
The agent SDKs that wrap these runtime APIs layer additional structure: guardrails, hand-off logic between specialised agents, tracing, evaluation primitives, and the framework conventions for production deployment. The competitive landscape has multiple vendors and open-source projects competing on this layer, but the underlying APIs are converging on a recognisable shape – tool-loop orchestration, structured outputs, streaming, observability hooks – that is increasingly portable across implementations.
For enterprise teams, the practical guidance is to treat the runtime as the replaceable layer. Pick a runtime that fits the current stack and team familiarity; concentrate business logic in MCP servers below it; expect to migrate the runtime at least once over a 24-month deployment window, and architect for that migration.
Agent-to-agent protocols and the two-tier picture
Where MCP standardises how a single model reaches tools, the emerging agent-to-agent (A2A) protocol family standardises how agents reach other agents across organisational boundaries: capability discovery, task delegation, result streaming, authentication, and trust establishment. The design explicitly assumes a future where enterprises publish specialised agents (procurement, scheduling, research, data extraction) that other agents can invoke without bespoke integration.
Treating these specifications as a stack rather than competitors clarifies the architectural picture. MCP is the tool-access layer (single agent to many tools). A2A is the inter-agent layer (one agent to other agents). The runtime APIs sit on top and orchestrate both. Most enterprise conversations in 2026 are no longer "which agent framework should we pick?". They are "which parts of this stack are we committing to as the durable architecture, and which do we keep pluggable?".
For most enterprise programmes in early 2026, the answer is: commit to MCP at the tool-access layer (the standard is mature and the migration cost is low), keep the runtime pluggable (the layer is still churning), pilot A2A for cross-team workflows where the organisational scaling actually applies, and treat the whole stack as a refactoring opportunity to escape vendor-lock-in rather than as a way to commit harder to a specific vendor.
What enterprise teams should build around
The architectural guidance for enterprise AI teams in 2026 assumes the protocol layer is stable and the runtime layer is not. That asymmetry translates into a handful of concrete architectural moves:
- Build tools as MCP servers, not as framework-specific plugins. Even if the team is committed to a single foundation-model provider today, writing tools against MCP buys portability for the next migration – and in our experience the migration happens within 18 months on most production deployments.
- Keep the agent runtime thin and explicitly replaceable. Whether the team uses a commercial agent SDK or an internal wrapper, expect to replace the runtime within 24 months. Concentrate business logic in MCP servers and shared libraries; treat the runtime as scaffolding rather than load-bearing architecture.
- Treat the tool catalogue as an API surface. Version it, document it, test it, rate-limit it, audit it. The tool catalogue is the durable asset of the agent architecture – the underlying model is not, and the runtime is not.
- Separate the prompt and instruction layer from the tool layer. When the team swaps models or runtimes, prompts almost always need tuning; tool definitions should not. The separation makes model swaps incremental rather than monolithic.
- For multi-team deployments, pilot A2A or equivalent inter-agent protocols for cross-team workflows even if the team keeps MCP for same-team tool use. The cross-team shape (capability discovery, authentication, delegation) is the architecturally hard part, and the A2A abstraction has become a reasonable default to anchor against.
- Version every tool, every prompt, every model selection, and every agent topology decision in source control. The agent architecture is moving fast enough that the audit trail is the artefact that distinguishes a defensible production system from one that nobody can reason about a year in.
Security is where most agent deployments go wrong
An MCP server is code. Specifically, it is code that an LLM can invoke with arguments derived from (possibly adversarial) user input. The security model has to account for three failure modes that naive deployments routinely miss:
- Prompt injection reaching a privileged tool. A user prompt (or a document the user supplies) crafted to coerce the model into invoking a tool it should not invoke, or invoking it with attacker-controlled arguments. Privileged tools that perform write or delete operations need explicit confirmation, not just confidence-threshold routing.
- Over-scoped tool permissions. An "update" tool that can modify any record is a single-prompt-injection away from a data-integrity incident. Decompose write capabilities into narrowly-scoped, separately-confirmed tools rather than rolling them into generic mutation operations.
- Supply-chain risk from community MCP servers. Pulling a community MCP server into production introduces all the supply-chain risks of any other open-source dependency, plus the additional risk that the server is an active LLM-invoked attack surface. Allowlist approved servers, sign versions, audit code, and pin to specific versions.
Baseline security controls for production agent deployments
The practical controls that should be operational baseline rather than optional add-ons:
- Run MCP servers in sandboxed environments with least-privilege credentials. The server can only access what its specific function requires, not the broader enterprise environment.
- Treat every write, delete, or external-effect operation as a separate, explicitly-confirmed tool. Generic "update" or "execute" tools are excessive-agency design patterns that the OWASP LLM Top 10 specifically flags.
- Maintain an allowlist of approved MCP servers with signed and pinned versions. Audit changes to the allowlist on a release cadence.
- Log every tool invocation with model input, resolved arguments, result, and the user identity that triggered the agent call. The audit trail is regulatory evidence and the basis of post-incident investigation.
- Rate-limit per-tool per-user. Loop detection and runaway-cost protection both depend on the rate-limit layer being in place.
- Validate tool outputs before they re-enter the model context. Adversarial tool output (a poisoned database row, a manipulated search result) is the second-order injection vector that bypasses prompt-input validation.
- For regulated workloads, ensure the agent action log is part of the regulatory evidence pipeline alongside the model evaluation and dataset documentation.
The regulatory dimensions added in 2024-2026
Agent architectures have entered the regulatory perimeter through several frameworks that have come into force or matured through 2024–2026.
- The EU AI Act treats agentic systems used in high-risk applications as subject to the Article 9–15 requirements on risk management, data governance, technical documentation, transparency, evaluation, and human oversight. The per-tool audit trail and the evaluation suite are both regulatory evidence.
- The NIST AI Risk Management Framework explicitly addresses agentic systems, including the additional risks of tool use, multi-step action, and autonomous decision-making. The framework is the de-facto reference for US-facing programmes.
- OWASP LLM Top 10 (2025 edition) codifies the agent-specific vulnerabilities (excessive agency, prompt injection chained through tools, supply-chain risk in tool plugins) that production deployments need to design against.
- APAC personal-data-protection regimes (PDPA Singapore, PDPA Thailand, Vietnam Decree 13, PIPA Korea) all have specific provisions for automated decision-making that apply when agents take actions on personal data, particularly write operations.
What to pilot in the next 90 days
For enterprise teams that have not yet committed to a standardised agent architecture, the 90-day plan we are recommending in 2026 is small, concrete, and designed to build team muscle memory before the customer-facing pressure arrives:
- Days 1–30: stand up an internal MCP server exposing one high-value but low-risk tool – typically a read-only wrapper over an internal knowledge source, a ticketing system, or a structured-data query. Drive it from a single agent runtime end-to-end, including basic evaluation and tracing.
- Days 31–60: port the same tool to a second runtime to prove portability is real for the stack. The point is not the portability per se; it is the team's muscle memory around tool-versioning, trace analysis, and evaluation against a regression set across runtimes.
- Days 61–90: add a second tool with write capability under the explicit-confirmation security pattern, plus an evaluation harness that scores the agent's tool selection and trajectory quality. The write-capability work is where the security model becomes concrete.
Frequently asked questions
Common questions raised by enterprise architects committing to the MCP + agent-runtime + A2A stack:
- Is MCP genuinely vendor-neutral, or is it tied to a specific provider? Open specification with multi-vendor implementation. Multiple commercial AI providers and open-source projects support it as first-class; the specification is governed independently of any single vendor. Treating it as vendor-neutral in architecture decisions is defensible.
- Should I use the official agent SDK from my AI provider, or an open-source framework? Either is workable. The decision is less about the framework choice and more about treating the framework as the replaceable layer in the architecture. Pick what fits the team's familiarity; do not over-commit business logic to it.
- How do I handle authentication for MCP servers in production? OAuth-style flows for human-initiated agents; service-account credentials for automated agents; explicit per-tool scoping so a compromised server cannot escalate. The auth model is part of the architecture, not an afterthought.
- When should I adopt A2A or similar inter-agent protocols? When the organisation has more than one team owning agents and the agents need to interact across team boundaries. For single-team deployments, MCP at the tool layer is usually sufficient and A2A is premature complexity.
- How fast is this space still moving in 2026? The protocol layer (MCP) is stabilising. The runtime layer is still churning quarterly. The A2A layer is earlier-stage and worth watching but not yet load-bearing for most deployments. Architecting around the asymmetry (commit to the protocol, keep the runtime pluggable) is the durable strategy.


