Building the Damodaran Financial Bot with OpenClaw — From Architecture to a Tested Mock POC

In the previous article, we moved from the idea of a Damodaran-style financial bot toward a practical multi-agent architecture: a system where different agents are responsible for data retrieval, assumptions, valuation, supervision, and report writing. That architecture gave us the direction of travel. The next question is more concrete: how do we turn that architecture into a project that can be tested, evolved, and eventually connected to real financial data sources? (Medium)

This article continues from that point by focusing on the proof-of-concept implementation. The goal is not to build a production-grade valuation platform in one jump. The goal is to build a small, deterministic, testable version of the system that proves the shape of the application before we introduce live data, complex assumptions, model drift, flaky APIs, or LLM variability.

That distinction matters. A financial agent system can become difficult to debug very quickly. If an answer is wrong, the source of the problem may be the financial data, the normalization logic, the assumption layer, the valuation formula, the scenario model, the report writer, the agent instruction, or the orchestration path between them. If all of those are built at once, the system becomes impressive but opaque. A mocked POC gives us the opposite: a narrow, controlled, deterministic pipeline where every layer can be tested independently.

In this version of the Damodaran Financial Bot, we start with one company fixture, CROX on NASDAQ. The financial statements are mocked. The assumptions are mocked. The DCF and scenario outputs are deterministic. The report is deterministic. That may sound limited, but it is exactly the point. We are not testing whether we can fetch every company in the market yet. We are testing whether the architecture works, whether each responsibility is isolated, whether the agents call the right tools, whether the MCP server exposes the correct interface, and whether the valuation core can be trusted by tests before it is expanded.

The Project Shape

The POC is structured around three boundaries: agents, the MCP server, and the valuation core package.

At the top level, the project looks like this:

├── agents
│   ├── assumptions-agent
│   ├── data-agent
│   ├── supervisor-agent
│   ├── valuation-agent
│   └── writer-agent
├── mcp_server
│   ├── dbot_mcp
│   └── tests
├── packages
│   └── valuation_core
└── package.json

This separation is intentional. The agents describe behavior. The MCP server exposes capabilities. The valuation core implements deterministic business logic. Those are different concerns and they should not be mixed.

The agents should not contain financial formulas. They should not invent assumptions. They should not normalize statement values themselves. They are instruction-driven workers that know when to call a skill and what shape of input and output to preserve. The MCP server should not become a financial modeling library. It should expose tools over a protocol boundary and convert requests into calls to the underlying package. The valuation core should not know anything about OpenClaw, agents, prompts, or MCP sessions. It should be a normal Python package that can be tested with normal Python tests.

That gives us a layered structure:

Agent instructions
      ↓
Agent skill contract
      ↓
MCP tool call
      ↓
MCP server tool wrapper
      ↓
valuation_core package
      ↓
DTO output

This shape is one of the most important architectural decisions in the POC. It means we can test the valuation library without running agents. We can test the MCP server without invoking a full multi-agent workflow. We can test agent contracts by checking their expected input and output shapes. Finally, we can run an integration test over the stdio MCP server to verify that the protocol boundary works.

Why Start With Mocked Data?

In a finance bot, mocked data is not a shortcut. It is an engineering control.

If we immediately connect to live APIs, every test can fail for reasons unrelated to our architecture. An external provider may change its response format. A network call may timeout. A ticker may have missing fields. A provider may restate historical values. Currency handling may differ between exchanges. Those are all real problems, but they are not the first problems we should solve.

The first problem is whether the pipeline itself is correct.

For example, the data agent should retrieve structured statements. The assumptions agent should add assumptions without changing the statements. The valuation agent should add DCF and scenario outputs without rewriting the input. The writer agent should render a report without recalculating valuation. The supervisor should pass each specialist output forward without modifying values.

Those invariants can be tested with mocked data. In fact, they are easier to test with mocked data because the expected values do not move.

The mocked CROX fixture gives us a stable baseline:

def parse_statements(ticker: str, exchange: Exchange) -> FinancialStatements:
    """POC: returns hardcoded CROX statements regardless of input."""
    if ticker.upper() != "CROX":
        raise DataNotFoundError(f"No fixture available for ticker: {ticker}")

    return FinancialStatements(
        ticker="CROX",
        exchange=exchange,
        period="FY2024 (TTM)",
        currency=Currency.USD,
        income_statement=IncomeStatement(
            revenue=3_900_000_000,
            ebit=1_050_000_000,
            ebit_margin=0.27,
            effective_tax_rate=0.22,
            interest_expense=120_000_000,
            sbc=80_000_000,
            da=120_000_000,
        ),
        balance_sheet=BalanceSheet(
            total_debt=2_000_000_000,
            cash=200_000_000,
            net_debt=1_800_000_000,
            book_equity=1_600_000_000,
            invested_capital=3_400_000_000,
        ),
        cash_flow=CashFlowStatement(
            operating_cash_flow=900_000_000,
            capex=150_000_000,
            change_in_wc=50_000_000,
            fcff=770_000_000,
        ),
    )

The code is deliberately simple. It returns a fixed DTO for one supported ticker and raises a domain error for unsupported tickers. This is not the final data layer. It is a test fixture wrapped in the same interface that a real parser or API adapter will later use.

That is the key benefit: when live data arrives, the interface can stay stable. The implementation behind parse_statements can change from hardcoded fixture to API adapter, CSV loader, SEC parser, or database query, while the rest of the system continues to depend on the same contract.

Test-Driven Development in This Context

Test-driven development, or TDD, is often summarized as “write the test before the implementation.” That summary is technically true, but incomplete. In this project, TDD is more useful as a design method than as a ritual.

The test forces us to state what the system should do before we implement how it does it. That is especially valuable in multi-agent systems because natural language instructions can hide ambiguity. A test removes some of that ambiguity by turning the desired behavior into executable expectations.

For example, before we care about how financial statements are parsed, we can state what the parser must return for the CROX fixture:

def test_parse_crox_returns_fixture():
    ticker = "CROX"
    exchange = Exchange.NASDAQ

    fs = parse_statements(ticker, exchange)

    assert fs.ticker == ticker
    assert fs.exchange == exchange
    assert fs.income_statement.revenue == 3_900_000_000.0
    assert fs.balance_sheet.net_debt == 1_800_000_000.0
    assert fs.cash_flow.fcff == 770_000_000.0

This test does several things. It confirms that the parser supports the expected fixture. It confirms that exchange is preserved. It confirms that the financial statement DTO exposes income statement, balance sheet, and cash flow fields in the expected structure. It also gives future contributors a warning: if they change one of these values, they are changing the baseline fixture and should do so intentionally.

The same idea applies to assumptions:

def test_load_crox_assumptions():
    a = load_assumptions("CROX", Exchange.NASDAQ)

    assert a.wacc == 0.098
    assert a.terminal_growth == 0.025
    assert a.beta == 1.45
    assert a.weight_equity + a.weight_debt == pytest.approx(1.0)

This test is more than a value check. It encodes a financial invariant: the capital structure weights should add up to approximately 100%. A good test suite should include both exact expected outputs and domain sanity checks. Exact values are useful for deterministic fixtures. Domain checks are useful because they protect the model from impossible or inconsistent assumptions.

Why Tests Matter More in Agentic Systems

Traditional software can fail because of bugs. Agentic software can fail because of bugs, ambiguous instructions, tool misuse, malformed intermediate state, hidden assumptions, and output drift.

That means tests are not optional glue. They are the main mechanism that keeps the system understandable.

In this POC, tests serve several roles:

They define the contract of each core valuation function.
They protect DTO shapes from accidental changes.
They verify MCP tools expose the expected outputs.
They check the stdio MCP server can be launched and called through the protocol.
They prevent agents from becoming responsible for logic that belongs in the core package.

A valuation model is also a high-trust domain. If the system says that intrinsic value is $154 per share, the user needs a way to understand where that number came from. Tests do not make the model financially correct by themselves, but they make the software path reproducible. Reproducibility is the first requirement before deeper financial validation can happen.

The Core Package: Keeping Finance Logic Outside the Agents

The valuation_core package is where the deterministic valuation logic lives. It is structured as a normal Python package:

packages
└── valuation_core
    ├── pyproject.toml
    └── valuation_core
        ├── assumptions
        ├── common
        ├── statements
        └── valuation

The common module contains DTOs, enums, money helpers, period helpers, and domain errors. This layer is intentionally boring. That is a good thing. DTOs are the shared language of the system. If they are stable, every other layer becomes easier to reason about.

A simplified example is the FinancialStatements DTO:

@dataclass
class FinancialStatements:
    ticker: str
    exchange: Exchange
    period: str
    currency: Currency
    income_statement: IncomeStatement
    balance_sheet: BalanceSheet
    cash_flow: CashFlowStatement

    def as_dict(self) -> dict:
        return asdict(self)

This DTO is important because agents and MCP tools pass JSON-like structures around, while the valuation package can work with typed Python objects. The as_dict method provides a clean exit point from the typed domain model back into serializable data.

The valuation module follows the same pattern. The POC implementation of DCF is fixed, but the return type is already the shape we expect from a real model:

def run_dcf(ticker: str, exchange: Exchange) -> DCFResult:
    """POC: returns hardcoded CROX DCF result."""
    if ticker.upper() != "CROX":
        raise DataNotFoundError(f"No fixture available for ticker: {ticker}")

    return DCFResult(
        ticker="CROX",
        exchange=exchange,
        pv_explicit_fcff=5_000_000_000.0,
        pv_terminal_value=5_400_000_000.0,
        enterprise_value=10_400_000_000.0,
        net_debt=1_800_000_000.0,
        equity_value=8_600_000_000.0,
        diluted_shares=56_000_000.0,
        intrinsic_value_per_share=154.0,
        wacc=0.098,
        terminal_growth=0.025,
        fcff_projection=[],
    )

In the actual project, the projection list contains a ten-year explicit forecast. The important detail is not whether the current POC computes that forecast dynamically. It does not. The important detail is that the result already looks like a real DCF result. That allows the MCP tool, valuation agent, writer agent, and report renderer to be built against the final interface before the final financial engine exists.

This is one of the most useful POC patterns: mock the internals, not the boundary.

The MCP Server as a Tool Boundary

The MCP server sits between the agents and the valuation package. Its job is to expose a small set of tools:

financial_statements
assumptions
scenario_valuation
report_writer

Each tool is intentionally narrow. The financial_statements tool returns normalized statements. The assumptions tool returns valuation assumptions. The scenario_valuation tool returns DCF and scenario results. The report_writer tool returns markdown.

This prevents one giant tool from doing everything. It also gives each agent one obvious tool to call.

A typical tool wrapper is small:

from valuation_core.assumptions import load_assumptions
from dbot_mcp.tools.common import TICKER_EXCHANGE_INPUT_SCHEMA

NAME = "assumptions"
DESCRIPTION = "Return WACC, growth, and reinvestment assumptions for a ticker."
INPUT_SCHEMA = TICKER_EXCHANGE_INPUT_SCHEMA


def run(arguments: dict) -> dict:
    ticker = arguments["ticker"]
    exchange = arguments["exchange"]
    return load_assumptions(ticker, exchange).as_dict()

This wrapper does not calculate WACC. It does not validate finance theory. It does not format prose. It simply adapts MCP tool input to a core package function and returns a serializable dictionary.

The shared input schema is also important:

TICKER_EXCHANGE_INPUT_SCHEMA = {
    "type": "object",
    "properties": {
        "ticker": {"type": "string", "description": "Stock ticker (e.g. CROX)."},
        "exchange": {
            "type": "string",
            "description": "Stock exchange (e.g. NASDAQ).",
            "enum": EXCHANGES,
        },
    },
    "required": ["ticker", "exchange"],
    "additionalProperties": False,
}

By making ticker and exchange explicit, every tool receives enough context to resolve the fixture. By using an enum for exchange, the tool boundary rejects unsupported exchange names early. In a future production version, this same schema can be extended with period type, filing source, currency preference, restatement policy, or data provider options.

The server then registers the tools and exposes them over stdio:

_TOOLS = (
    financial_statements,
    assumptions,
    scenario_valuation,
    report_writer,
)
_TOOLS_BY_NAME = {t.NAME: t for t in _TOOLS}


@server.list_tools()
async def list_tools() -> list[types.Tool]:
    return [
        types.Tool(
            name=t.NAME,
            description=t.DESCRIPTION,
            inputSchema=t.INPUT_SCHEMA,
        )
        for t in _TOOLS
    ]


@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
    tool = _TOOLS_BY_NAME.get(name)
    if tool is None:
        raise ValueError(f"Unknown tool: {name}")

    result = tool.run(arguments or {})
    text = json.dumps(result, indent=2, default=str)

    return [types.TextContent(type="text", text=text)]

This gives the system a clean protocol layer. An MCP-aware host, such as OpenClaw, can discover available tools, inspect their schemas, and call them through a standard interface. The finance logic remains inside valuation_core. The agent instructions remain inside agents. The MCP server is the bridge.

Testing the MCP Layer

The MCP tools have their own tests because they are not just pass-through functions. They are the public interface exposed to the agent runtime. If a tool changes its output shape, the agents may break even if the core package still passes its own tests.

For example, the scenario_valuation tool is tested like this:

def test_returns_crox_dcf_and_scenarios():
    out = run({"ticker": "CROX", "exchange": "NASDAQ"})

    assert out["dcf"]["intrinsic_value_per_share"] == 154.0
    assert out["scenarios"]["iv_range_high"] == 200.0
    assert {s["name"] for s in out["scenarios"]["scenarios"]} == {
        "Bull",
        "Base",
        "Bear",
        "Stress",
    }

This test confirms that the tool returns both DCF and scenario outputs. It also checks the scenario set. The valuation agent depends on that structure because it places this tool output under the valuation key in its pipeline response.

The report writer tool has a different kind of test:

def test_returns_crox_report_markdown():
    out = run({"ticker": "CROX", "exchange": "NASDAQ"})

    assert out["ticker"] == "CROX"
    assert out["exchange"] == "NASDAQ"
    assert out["report_md"].startswith("# CROX (NASDAQ) Valuation Report")
    assert "WACC" in out["report_md"]

This test is intentionally not too strict about the full markdown body. For a report, we usually want to verify the essential structure rather than make the test brittle against every formatting change. The test confirms that the report is for the expected ticker and exchange, that it begins with the expected title, and that it includes a cost-of-capital section.

There is also an integration test over the stdio server:

def test_integration_pipeline_runs_over_stdio_server():
    async def _run() -> None:
        server = StdioServerParameters(
            command="python",
            args=["-m", "dbot_mcp.server"],
        )

        async with stdio_client(server) as (read_stream, write_stream):
            async with ClientSession(read_stream, write_stream) as session:
                await session.initialize()

                tools = await session.list_tools()
                assert {tool.name for tool in tools.tools} == {
                    "financial_statements",
                    "assumptions",
                    "scenario_valuation",
                    "report_writer",
                }

                result = await session.call_tool(
                    "report_writer",
                    {"ticker": "CROX", "exchange": "NASDAQ"},
                )

                payload = json.loads(result.content[0].text)
                assert payload["report_md"].startswith(
                    "# CROX (NASDAQ) Valuation Report"
                )

    asyncio.run(_run())

This test is valuable because it checks the real server path, not just individual Python functions. It launches the server, initializes a client session, lists the tools, calls a tool, parses the returned content, and verifies the report. That gives confidence that the MCP packaging and protocol wiring are correct.

Agents as Contracts, Not Calculation Engines

The agents are deliberately instruction-first. Each agent has an AGENTS.md file that defines its role, responsibilities, restrictions, input, and output. Agents that need tools also have a skill file under skills/SKILL.md.

The distinction between an agent and a skill is useful. The agent file describes the worker’s general behavior. The skill file describes a specific callable capability and how it maps pipeline input to MCP tool input.

For example, the Data Agent is responsible for retrieving deterministic company financial statement data. Its restrictions are just as important as its responsibilities:

# Data Agent

You provide deterministic company financial statement data.

Responsibilities:
- Retrieve income statement, balance sheet, and cash flow statement data.
- Return structured financial statement data.
- Preserve reported metric names, periods, and values.

Restrictions:
- Do not calculate valuation.
- Do not create assumptions.
- Do not infer missing values.
- Do not modify source values.
- Do not browse the web.
- Do not call reporting or valuation tools.

This instruction file prevents scope creep. Without it, the data agent might try to “help” by filling missing values, calculating margins, or adding commentary. That behavior may look useful in a demo, but it is dangerous in a financial system. The data agent should retrieve and preserve data. If normalization is needed, it belongs in the approved tool or the core package, not in free-form agent reasoning.

The corresponding skill makes the tool mapping explicit:

---
name: retrieve-financial-statements
description: Retrieve deterministic company financial statement data.
---

Use this skill when company financial statements are needed.

Call this MCP tool:
- financial_statements

Pipeline input:
{
  "ticker": "<ticker>",
  "exchange": "<exchange>"
}

MCP tool input:
{
  "ticker": "<ticker>",
  "exchange": "<exchange>"
}

Rules:
- Return only the pipeline output.
- Preserve `exchange` from the pipeline input when the MCP tool does not return it.
- Do not summarize.
- Do not normalize values unless the MCP tool already does it.
- Do not fill missing values.
- Do not call other DBOT tools.

This skill is a contract. It tells the agent which tool to call, what arguments to pass, and how to treat the response. In larger systems, this is the difference between an agent that behaves predictably and an agent that improvises.

The Writer Agent is a different kind of specialist. It does not retrieve data or calculate valuation. It receives completed valuation results and calls the report writer tool:

# Writer Agent

You produce deterministic valuation reports from provided valuation results.

Workflow:
1. Receive the Valuation Agent pipeline output.
2. Extract `ticker` and `exchange` from `valuation.dcf`.
3. Call the MCP tool `report_writer`.
4. Return `report_md` exactly as produced by the tool.

Rules:
- Do not calculate valuation.
- Do not change valuation values.
- Do not retrieve financial statements.
- Do not create assumptions.
- Do not add unsupported market commentary.
- Output markdown only.

This design choice may look strict, but it is necessary. If the writer agent is allowed to add unsupported commentary, the final report may contain claims that did not come from the valuation engine. By forcing the writer to return report_md exactly as produced by the tool, we keep the final narrative deterministic.

The Supervisor Agent

The supervisor is the coordinator. It should not do specialist work. It should not fetch financial statements itself. It should not calculate intrinsic value. It should not write the report. It should only move outputs from one specialist to the next.

The workflow is simple:

User request
   ↓
Supervisor Agent
   ↓
Data Agent
   ↓
Assumptions Agent
   ↓
Valuation Agent
   ↓
Writer Agent
   ↓
Final markdown report

The supervisor’s contract defines the pipeline:

Pipeline contracts:
- Data Agent input: `{ "ticker": "<ticker>", "exchange": "<exchange>" }`
- Data Agent output -> Assumptions Agent input: financial statements JSON.
- Assumptions Agent output -> Valuation Agent input:
  `{ "financial_statements": {}, "assumptions": {} }`
- Valuation Agent output -> Writer Agent input:
  `{ "financial_statements": {}, "assumptions": {}, "valuation": { "dcf": {}, "scenarios": {} } }`
- Writer Agent output: markdown report.

This is the orchestration layer. The supervisor does not need to understand the internal structure of the DCF model. It only needs to know which output goes where. If any specialist returns an error, the supervisor stops and returns a structured error. That behavior is critical because continuing after a missing data error or malformed assumption set would produce a misleading report.

Why Use a Subset of Agents First?

The full target architecture may eventually include many more agents: market data agents, filings agents, macro agents, risk agents, peer comparison agents, accounting adjustment agents, citation agents, and compliance agents. Starting with all of them would create unnecessary complexity.

A POC should contain the smallest set of agents needed to prove the architecture.

In this project, that subset is:

Data Agent
Assumptions Agent
Valuation Agent
Writer Agent
Supervisor Agent

That is enough to test the complete valuation path from ticker input to markdown output. Each agent has a single clear responsibility. Each specialist maps to one MCP tool. The supervisor composes them.

This subset also mirrors the natural stages of a valuation workflow. First, collect statements. Second, apply assumptions. Third, run valuation. Fourth, write the report. That is simple enough to test but realistic enough to be meaningful.

Once this path works, new agents can be added without changing the core concept. For example, a future Market Price Agent could provide current market price. A Filings Agent could retrieve 10-K or annual report data. A Peer Agent could supply comparable company metrics. A Risk Agent could check sensitivity to WACC and terminal growth. Each new agent should be added only when there is a clear contract and a testable tool boundary.

The Report Writer

The report writer is implemented as an MCP tool that calls the core package and formats the result into markdown. A shortened version looks like this:

def _format_report_markdown(report: ValuationReport) -> str:
    fs = report.financial_statements
    assumptions = report.assumptions
    dcf = report.dcf
    scenarios = report.scenarios
    exchange = getattr(fs.exchange, "value", fs.exchange)

    return (
        f"# {fs.ticker} ({exchange}) Valuation Report\n\n"
        f"## Inputs\n"
        f"- Period: {fs.period}\n"
        f"- Revenue: ${fs.income_statement.revenue/1e9:.1f}B\n"
        f"- EBIT: ${fs.income_statement.ebit/1e9:.2f}B "
        f"(margin {fs.income_statement.ebit_margin:.0%})\n"
        f"- Net debt: ${fs.balance_sheet.net_debt/1e9:.1f}B\n\n"
        f"## Cost of Capital\n"
        f"- WACC: {assumptions.wacc:.2%}\n"
        f"- Cost of equity: {assumptions.cost_of_equity:.2%}\n"
        f"- Terminal growth: {assumptions.terminal_growth:.2%}\n\n"
        f"## DCF\n"
        f"- Enterprise Value: ${dcf.enterprise_value/1e9:.1f}B\n"
        f"- Equity Value: ${dcf.equity_value/1e9:.1f}B\n"
        f"- Intrinsic Value / Share: ${dcf.intrinsic_value_per_share:.0f}\n"
    )

This code is intentionally deterministic. It does not ask an LLM to write a valuation conclusion. It does not produce stylistic variation. It takes a structured ValuationReport DTO and renders a repeatable report.

That is the correct approach for the POC. Later, a more advanced writer agent could produce richer commentary, but it should still be grounded in structured data and clearly separated from calculation. Narrative flexibility should come after numerical correctness, not before it.

The End-to-End Flow

Putting everything together, the POC flow works like this.

A user asks for a valuation of CROX on NASDAQ. The Supervisor Agent sends that input to the Data Agent. The Data Agent calls the financial_statements MCP tool. That tool calls valuation_core.statements.normalize_statements, which currently returns the deterministic CROX fixture.

The Supervisor passes the resulting financial statements to the Assumptions Agent. The Assumptions Agent calls the assumptions MCP tool. That tool calls valuation_core.assumptions.load_assumptions, which returns the deterministic WACC, beta, terminal growth, and reinvestment assumptions.

The Supervisor then passes financial statements and assumptions to the Valuation Agent. The Valuation Agent calls the scenario_valuation MCP tool. That tool calls run_dcf and run_scenarios from valuation_core.

Finally, the Supervisor passes the valuation output to the Writer Agent. The Writer Agent calls report_writer, receives deterministic markdown, and returns it as the final result.

The final report begins with a predictable heading:

# CROX (NASDAQ) Valuation Report

## Inputs
- Period: FY2024 (TTM)
- Revenue: $3.9B
- EBIT: $1.05B (margin 27%)
- Net debt: $1.8B

## Cost of Capital
- WACC: 9.80%
- Cost of equity: 11.45%
- Terminal growth: 2.50%

The exact formatting is less important than the pipeline discipline. Every number in this report comes from a deterministic DTO. The writer does not invent values. The supervisor does not mutate values. The valuation agent does not retrieve data. The assumptions agent does not calculate valuation. The data agent does not write prose.

That separation is what makes the system testable.

What This POC Proves

This mocked POC does not prove that the valuation model is complete. It does not prove that CROX is worth $154 per share in a live market context. It does not prove that the assumptions are correct for every investor. Those are future financial validation questions.

What it does prove is more foundational:

The repository can separate agents, tools, and domain logic.
The valuation core can return typed DTOs.
The MCP server can expose deterministic tools.
Tool wrappers can be unit tested.
The stdio server can be integration tested.
Agents can be constrained by explicit contracts.
The supervisor can coordinate a specialist pipeline.
The report can be generated from structured outputs rather than free-form reasoning.

That is a strong POC. It proves the skeleton before adding muscle.

Where to Go Next

The next phase should expand the system without breaking the contracts. The most obvious improvement is replacing the hardcoded CROX fixture with a real data adapter. That adapter should still return the same FinancialStatements DTO. The tests should still pass for fixture mode. New tests can be added for the live adapter using recorded responses or local fixtures.

The second improvement is making the DCF calculation dynamic. Instead of returning a fixed intrinsic value, run_dcf should consume statements and assumptions directly. That change should be driven by tests. For example, one test can verify the projected FCFF schedule. Another can verify terminal value. Another can verify equity value after subtracting net debt. Another can verify per-share value.

The third improvement is scenario sensitivity. Today, the scenario outputs are mocked. In the next version, bull, base, bear, and stress scenarios can be generated from explicit changes to growth, margin, reinvestment, WACC, and terminal growth. Again, each scenario should be testable.

The fourth improvement is richer report generation. The writer can eventually include sensitivity tables, assumptions summaries, and warnings about missing data. But the report should remain grounded in DTOs. If the report contains a number, that number should come from the valuation output. If the report contains a qualitative statement, the source of that statement should be explicit.

Final Thoughts

The main lesson from this POC is that agentic financial systems should be built from the inside out. Start with deterministic domain logic. Wrap it with narrow tools. Expose those tools through a protocol boundary. Give agents strict contracts. Use a supervisor to compose the workflow. Test every layer.

Mocked data is not a weakness in this process. It is the mechanism that lets us design the system before the real world makes it noisy. A deterministic CROX fixture gives us stable expectations. Stable expectations give us tests. Tests give us confidence to refactor. Refactoring gives us room to replace mocks with real implementations.

That is how a concept becomes an architecture, and how an architecture becomes software.

The result is not just a Damodaran-style financial bot that can produce a valuation report. It is a project structure that can grow without collapsing under its own complexity.

Deep Research and Development