Agent Layer Architecture¶
The agent layer (molexp.agent) is a clean, user-facing wrapper around
pydantic-ai and
pydantic-graph. Both
libraries are implementation details hidden inside private subpackages
and never appear in the public surface.
Public API¶
The agent layer exposes exactly four user-visible names plus the three concrete modes:
from molexp.agent import AgentRunner, AgentMode, AgentRunResult, AgentSession
from molexp.agent.modes import (
PlanMode, PlanModeConfig,
ChatMode, ChatModeConfig,
ReviewMode, ReviewModeConfig,
)
Construction is plain Python — no factory functions
(create_agent(...) / build_agent(...) / Agent(provider=...)).
AgentRunner lazily builds the underlying pydantic-ai router on first
.run(), so import molexp.agent is cheap.
Layer Boundaries¶
molexp.agent may import from molexp.workspace.* and
molexp.workflow.* (downstream layers). It must not import from
sibling application layers (molexp.plugins, molexp.server,
molexp.cli, molexp.sweep).
pydantic-ai firewall¶
The only places under src/molexp/agent/ allowed to import pydantic_ai
are files inside agent/_pydanticai/:
| File | Purpose |
|---|---|
_pydanticai/router.py |
PydanticAIRouter — concrete Router implementation; one Agent instance per (tier, schema | None) |
_pydanticai/capability_probe.py |
PydanticAICapabilityProbe — two-agent probe (needs drafter + MCP-attached evidence gatherer) for the capability discovery gate |
_pydanticai/mcp.py |
build_mcp_server helper used by ChatMode tool injection |
The firewall is enforced by tests/test_agent/test_import_guard.py.
import molexp.agent does not eagerly load pydantic_ai — every
construction site is hidden behind a lazy import that fires only on the
first AgentRunner.run() call.
pydantic-graph confinement¶
pydantic_graph is exclusively imported under
src/molexp/workflow/_pydantic_graph/. Nothing under
src/molexp/agent/ may import it. PlanMode drives multi-step
workflows through the public molexp.workflow API, which is the sole
sanctioned pg site.
Don't Reinvent pydantic-ai¶
Anything pydantic-ai provides natively in model-side execution (tool dispatch, MCP, retries, message history, structured output) MUST use pydantic-ai; do not build parallel implementations under
molexp.agent.
Concrete consequences:
- Tools — pass
pydantic_ai.tools.Toolinstances or bare callables toAgentRunner(tools=...); the router forwards them verbatim intoAgent(tools=...). No molexp middle layer. - MCP servers —
Agent(toolsets=[MCPServerStdio(...)]). molexp does not iterate over MCP needs by hand. - Retries —
Agent(retries=N). The router's outer retry budget on top of pydantic-ai is a structured-path safety net, not a re-implementation. - Message history — pydantic-ai's
RunResult.all_messages()andAgent(message_history=...). - Structured output —
Agent(output_type=Schema).
molexp retains ownership of the workflow / session / provenance
layer (PlanMode pipeline, SessionCatalog, on-disk evidence + assets)
because pydantic-ai does not cover those.
Capability Discovery Gate¶
PlanMode's 13-node pipeline inserts a two-node capability-discovery pair between IR compilation and codegen:
... → CompileTaskIR
→ DraftCapabilityNeeds → DiscoverCapabilities
→ GenerateWorkflowSkeleton
→ GenerateTaskTests / GenerateTaskImplementations
→ ValidateWorkspace → HumanReview → FinalHandoffCheck
The contract:
LLM decides what needs discovery. Agent workflow performs discovery through molcrafts-molmcp. LLM uses discovered evidence. Compiler rejects unevidenced API usage. Anything pydantic-ai provides natively in model-side execution (tool dispatch, MCP, retries, message history, structured output) MUST use pydantic-ai; do not build parallel implementations under
molexp.agent. Molexp retains ownership of the workflow / session / provenance layer (PlanMode pipeline,SessionCatalog, on-disk evidence + assets) — pydantic-ai does not cover these.
Mechanism¶
DraftCapabilityNeeds— a pydantic-aiAgent[None, CapabilityNeedReport](no tools) ingests the plan brief + workflow contract + per-task briefs and decides which Molcrafts APIs the experiment needs. Persisted tocapability/needs.yaml.DiscoverCapabilities— a second pydantic-aiAgent[None, CapabilityEvidenceBatch]mounts the molmcp MCP server throughAgent(toolsets=[MCPServerStdio(...)]). pydantic-ai drives the tool-call loop end-to-end (listing, dispatch, retries, output parsing). Persisted tocapability/evidence.yaml+capability/missing.md.- Codegen contract —
GenerateTaskTestsandGenerateTaskImplementationsconsume the evidence batch and: - augment the user prompt with the evidence appendix;
- require the LLM to populate
evidence_refson the schema and emit a module-level__capability_evidence__: tuple[str, ...]literal whose set equalsevidence_refs; - after writing, run
validate_codegen_evidence(source, batch)to diffast_refs∪declared_refsagainst the evidence batch'sapi_refset; - raise
UnevidencedApiReferenceon any miss. - Repair-loop integration —
drive_with_repaircatchesCapabilityDiscoveryRequiredandUnevidencedApiReferencefromspec.execute(). The first maps to a re-run of both discovery nodes; the second maps toDiscoverCapabilitiesonly on the first occurrence and escalates to both nodes from the second. Both exceptions subclassmolexp.workflow.WorkflowErrorso the workflow runtime propagates them to the loop. - Static post-check —
ValidateWorkspace.capability_evidence_checkre-runs the dual-signal diff over every generated file at handoff time so a hand-edited workspace cannot smuggle unevidenced refs past the gate.
NullCapabilityProbe blocks codegen¶
When no molmcp MCP server is configured, AgentRunner injects a
NullCapabilityProbe whose discover() raises
CapabilityDiscoveryRequired whenever discovery_required=True. This
is the load-bearing safety invariant: codegen MUST stop when discovery
cannot proceed; silently passing an empty evidence batch downstream
would let the LLM hallucinate Molcrafts API calls.
Pure-stdlib paths (discovery_required=False) are exempt — the probe
returns CapabilityEvidenceBatch(discovery_skipped=True) and codegen
skips the __capability_evidence__ block requirement.
Sessions¶
Session metadata + persistence lives at molexp.agent.sessions:
SessionMetadata— pydantic data type;SessionStore— on-disk vendor under<workspace>/.subsystems/agent.sessions/<session_id>/;SessionCatalog— the in-memory catalog the runner queries.
These are molexp-native (pydantic-ai does not provide a session layer) and are the canonical way to attach LLM transcripts to a workspace run.