# LLM Result Object — Clean Break Implementation Plan

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** Replace `LLMResultStr(str)` with a proper `LLMResult` dataclass as the return type of `call_llm()`, making metadata discoverable and the API honest.

**Architecture:** Rename existing `LLMResultData` → `LLMResult`, add convenience properties (`.text`, `.input_tokens`, `.truncated`), change `call_llm()` return type. Keep `LLMResultStr` as deprecated wrapper available via `call_llm_str()`. Mechanical migration of ~100 callers to use `.text` where string operations are needed.

**Tech Stack:** Python dataclasses, no new deps.

---

## Task 1: Evolve `LLMResultData` → `LLMResult` in `result.py`

**Files:**
- Modify: `lib/llm/result.py`

**Step 1: Rename class and add convenience properties**

```python
"""Canonical LLM result data model — zero heavy deps.

Importable from hot.py (httpx-only) and stream.py (litellm) alike.
"""

from __future__ import annotations

import dataclasses


@dataclasses.dataclass
class LLMResult:
    """Canonical LLM response object.

    Returned by call_llm(). Access .text for the response string,
    or use metadata fields directly.

    Usage:
        result = await call_llm("haiku", "hello")
        print(result.text)              # the response string
        print(result.cost)              # 0.0003
        print(result.input_tokens)      # 100
        print(result.truncated)         # False
        f"Answer: {result}"             # works via __str__
    """

    content: str = ""
    model: str | None = None
    usage: dict | None = None
    cost: float | None = None
    finish_reason: str | None = None
    reasoning_content: str | None = None
    reasoning_tokens: int | None = None

    @property
    def text(self) -> str:
        """Response text. Alias for .content matching httpx/requests convention."""
        return self.content

    @property
    def input_tokens(self) -> int:
        return (self.usage or {}).get("prompt_tokens", 0)

    @property
    def output_tokens(self) -> int:
        return (self.usage or {}).get("completion_tokens", 0)

    @property
    def total_tokens(self) -> int:
        return (self.usage or {}).get("total_tokens", 0)

    @property
    def truncated(self) -> bool:
        return self.finish_reason == "length"

    def to_dict(self) -> dict:
        return dataclasses.asdict(self)

    @classmethod
    def from_dict(cls, d: dict) -> LLMResult:
        fields = {f.name for f in dataclasses.fields(cls)}
        return cls(**{k: v for k, v in d.items() if k in fields})

    def __str__(self) -> str:
        return self.content

    def __bool__(self) -> bool:
        return bool(self.content)


# Backward compat alias — will be removed in future
LLMResultData = LLMResult
```

**Step 2: Verify import doesn't break**

Run: `python -c "from lib.llm.result import LLMResult, LLMResultData; r = LLMResult(content='hello'); assert r.text == 'hello'; assert r.input_tokens == 0; assert not r.truncated; assert str(r) == 'hello'; assert bool(r); print('OK')"`

**Step 3: Commit**

```
feat(llm): rename LLMResultData → LLMResult with convenience properties
```

---

## Task 2: Update `stream.py` — `call_llm` returns `LLMResult`

**Files:**
- Modify: `lib/llm/stream.py`

**Step 1: Update `call_llm()` return type and remove `.to_str()` calls**

Change the return type annotation from `LLMResultStr` to `LLMResult`:

```python
async def call_llm(...) -> LLMResult:
```

Import `LLMResult` at the top (it's already imported as `LLMResultData` via `from lib.llm.result import LLMResultData`).

Replace all three return paths in `call_llm()`:

**Cache hit path (line ~1213):**
```python
# Before:
return LLMResultData.from_dict(hit).to_str()
# After:
result = LLMResult.from_dict(hit)
_log_cost(result)
return result
```

**Non-streaming path (line ~1229):**
```python
# Before:
return result.to_str()
# After:
_log_cost(result)
return result
```

**Streaming path (line ~1262):**
```python
# Before:
return result.to_str()
# After:
_log_cost(result)
return result
```

**Step 2: Extract cost logging from `LLMResultStr._from_result` into a helper**

Add near the top of stream.py (after imports):

```python
def _log_cost(result: LLMResult) -> None:
    """Log LLM cost if applicable."""
    if result.cost and result.cost > 0 and result.model:
        try:
            _cost_log_call(model=result.model, cost=result.cost, usage=result.usage)
        except Exception as e:
            import sys
            print(f"WARNING: cost logging failed: {e}", file=sys.stderr)
```

**Step 3: Add `call_llm_str()` deprecated wrapper**

```python
async def call_llm_str(
    model: str,
    prompt: str,
    **kwargs: Any,
) -> LLMResultStr:
    """Deprecated: use call_llm() which returns LLMResult.

    This wrapper returns LLMResultStr (str subclass) for callers that
    haven't migrated yet.
    """
    import warnings
    warnings.warn("call_llm_str() is deprecated, use call_llm() instead", DeprecationWarning, stacklevel=2)
    result = await call_llm(model, prompt, **kwargs)
    return LLMResultStr(
        result.content, model=result.model, usage=result.usage,
        cost=result.cost, finish_reason=result.finish_reason,
        reasoning_content=result.reasoning_content,
        reasoning_tokens=result.reasoning_tokens,
    )
```

**Step 4: Keep `LLMResultStr` class but don't use it in new code**

Don't delete `LLMResultStr` — it's still used by `call_llm_str()` and potentially by external code. Just stop returning it from `call_llm()`.

**Step 5: Update `_call_llm_no_stream()` to use `LLMResult` instead of `LLMResultData`**

Just rename references since `LLMResultData` is now an alias. Update the import:
```python
from lib.llm.result import LLMResult
```

And change the return type annotation + construction:
```python
return LLMResult(
    content=content, model=model, usage=usage_dict, cost=cost,
    finish_reason=finish_reason, reasoning_content=reasoning_content,
    reasoning_tokens=reasoning_tokens,
)
```

**Step 6: Run basic sanity check**

Run: `python -c "import asyncio; from lib.llm.stream import call_llm; from lib.llm.result import LLMResult; print('import OK')"`

**Step 7: Commit**

```
feat(llm): call_llm returns LLMResult dataclass instead of LLMResultStr
```

---

## Task 3: Update `__init__.py` exports

**Files:**
- Modify: `lib/llm/__init__.py`

**Step 1: Update imports and exports**

```python
# Change:
from lib.llm.result import LLMResultData
# To:
from lib.llm.result import LLMResult, LLMResultData  # LLMResultData is alias

# Change:
from lib.llm.stream import (
    LLMResultStr,
    SubscriptionError,
    call_llm,
    stream_llm,
)
# To:
from lib.llm.stream import (
    LLMResultStr,
    SubscriptionError,
    call_llm,
    call_llm_str,
    stream_llm,
)

# In __all__, add:
"LLMResult",
"call_llm_str",
```

**Step 2: Commit**

```
feat(llm): export LLMResult and call_llm_str from lib.llm
```

---

## Task 4: Update internal `lib/llm/*` consumers

**Files:**
- Modify: `lib/llm/hot.py`
- Modify: `lib/llm/json.py`
- Modify: `lib/llm/cache.py`
- Modify: `lib/llm/server.py`
- Modify: `lib/llm/conversation.py` (if needed)

**Step 1: Update `hot.py`**

`hot.py` has its own `call_llm()` that returns `str` and `call_llm_rich()` that returns `LLMResultData`. Update references:
- Change `from lib.llm.result import LLMResultData` → `from lib.llm.result import LLMResult`
- `call_llm_rich()` return type: `LLMResultData` → `LLMResult`
- All `LLMResultData(...)` constructions → `LLMResult(...)`

**Step 2: Update `json.py`**

`call_llm_json()` calls `call_llm()` then does `repair_json(raw_text)`. Since `call_llm` now returns `LLMResult`, update:
- Where it passes result to `repair_json()`, use `result.text` or `str(result)`
- Where it checks `.finish_reason`, access directly on result (already works since `LLMResult` has this field)

**Step 3: Update `cache.py`**

- Change `LLMResultData` → `LLMResult` in imports and type hints
- `put_result()` already calls `result.to_dict()` which still works

**Step 4: Update `server.py`**

- Change `LLMResultStr` usage in `call_llm_endpoint()`:
  - `str(result)` → `result.text` (or keep `str(result)`, both work)
  - `getattr(result, "cost", None)` → `result.cost` (direct access, it's a dataclass now)
  - Same for `.usage`, `.finish_reason`, `.reasoning_tokens`, `.model`

**Step 5: Commit**

```
refactor(llm): update internal lib/llm modules to use LLMResult
```

---

## Task 5: Migrate callers — `lib/` (non-llm)

**Files to modify** (each needs `.text` added where string ops are used):
- `lib/eval/judge.py` — `raw = await call_llm(...)` then `parse_judge_json(raw)` → `parse_judge_json(raw.text)`
- `lib/extract/freeform.py` — check string usage
- `lib/embed/core.py` — check string usage
- `lib/ideas/extract.py`, `lib/ideas/eval.py`
- `lib/ingest/cli.py`, `lib/ingest/html_utils.py`, `lib/ingest/related_work.py`, `lib/ingest/serper.py`
- `lib/observe/core.py` — accesses `.cost`, `.usage`, `.model` via `hasattr` — works directly, change `hasattr` to direct access
- `lib/tune/core.py` — same as observe
- `lib/semnet/concepts.py`, `lib/semnet/chunk.py`
- `lib/vision/analyze.py` — `str(result)` already used, accesses `.cost` — works
- `lib/vision/compare.py`
- `lib/cost_log.py`
- `lib/gen/calibrate_temp_variety.py`

**Pattern for each file:**

1. Read the file, find `await call_llm(...)` usage
2. If result is passed to a function expecting str, used with `.strip()`, `.startswith()`, `json.loads()`, string concatenation — add `.text`
3. If result is used in f-string, `print()`, `logger` — no change needed (`__str__` handles it)
4. If result metadata is accessed (`.cost`, `.usage`) — no change needed (direct dataclass fields)
5. If `LLMResultStr` is imported — change to `LLMResult`

**Step 1: Apply fixes to all lib/ files listed above**

**Step 2: Run `python -c "from lib.eval.judge import score_single"` etc. for import checks**

**Step 3: Commit**

```
refactor(lib): migrate call_llm callers to use LLMResult.text
```

---

## Task 6: Migrate callers — `vario/`

**Files to modify:**
- `vario/interpret.py` — already uses `str(result)`, no change or use `.text`
- `vario/blocks/produce.py`, `score.py`, `reduce.py`, `revise.py`
- `vario/strategies/blocks/create.py` — typed as `LLMResultStr`, accesses `.usage`, `.cost` → change type to `LLMResult`, direct access works
- `vario/strategies/blocks/improve.py` — same pattern
- `vario/strategies/blocks/evaluate.py`, `meta.py`
- `vario/strategies/nl_parse.py`
- `vario/strategies/benchmark.py`
- `vario/cli_ng.py`, `vario/config_ui.py`, `vario/config.py`
- `vario/eval.py`, `vario/review.py`, `vario/reduce.py`
- `vario/directives.py`, `vario/macro_analyze.py`
- `vario/server.py`, `vario/engine/execute.py`, `vario/engine/nl_parse.py`

**Same pattern as Task 5.** For each file:
1. Find `await call_llm(...)` usage
2. Add `.text` where string ops are needed
3. Change `LLMResultStr` type annotations to `LLMResult`

**Step 1: Apply fixes to all vario/ files**

**Step 2: Import check**

**Step 3: Commit**

```
refactor(vario): migrate call_llm callers to use LLMResult.text
```

---

## Task 7: Migrate callers — remaining directories

**Batch A: intel/**
- `intel/companies/analyze.py`, `intel/companies/discover.py`, `intel/companies/gen_ai_universe.py`
- `intel/people/discover.py`, `intel/people/writings.py`, `intel/people/cluster.py`, `intel/people/common.py`, `intel/people/face_search.py`

**Batch B: draft/**
- `draft/claims/evaluate.py`
- `draft/core/roles.py`, `draft/core/rating.py`
- `draft/style/gym.py`, `draft/style/evaluate.py`, `draft/style/improve.py`

**Batch C: helm/ + doctor/ + learning/**
- `helm/analyze.py`, `helm/autodo/prompts.py`, `helm/autodo/scanner/planner.py`
- `doctor/analyze.py`, `doctor/expect.py`, `doctor/triage.py`
- All `learning/` files (~12 files)

**Batch D: jobs/ + finance/ + tools/ + projects/ + benchmarks/ + other**
- `jobs/handlers/` (4 files), `jobs/lib/` (2 files)
- `finance/earnings/backtest/` (3 files), `finance/lib/prices/symbols.py`
- `tools/supplychain/` (2 files), `tools/todos/enricher.py`
- `projects/skillz/`, `projects/vic/`
- `benchmarks/eval/` (~13 files), `benchmarks/docker/llm_init_shim.py`
- Misc: `ideas/landscape.py`, `infra/backup_discover.py`, `kb/scenario.py`, `present/critique.py`, `present/illustrator/analyze.py`, `video-analysis/`, `image-analysis/`, `time_gemini_lite.py`

**Same mechanical pattern for all.** Each batch gets one commit:

```
refactor(intel,draft): migrate call_llm callers to use LLMResult.text
refactor(helm,doctor,learning): migrate call_llm callers to use LLMResult.text
refactor(jobs,finance,tools): migrate call_llm callers to use LLMResult.text
```

---

## Task 8: Update tests

**Files:**
- `lib/llm/tests/test_cache.py` — update `LLMResultStr`/`LLMResultData` references
- `lib/llm/tests/test_stream.py` — update assertions (result is now `LLMResult`, not `str`)
- `lib/observe/tests/test_observe.py` — update mock `LLMResultStr` → `LLMResult`
- `lib/tune/tests/test_tune.py` — same
- `vario/strategies/tests/test_expand.py` — update type references
- `vario/tests/test_executor.py`, `test_revise.py`
- `vario/engine/tests/test_reasoning_effort.py`
- `lib/eval/tests/test_judge.py`
- `learning/session_review/pair_judge_compare.py` — imports `LLMResultStr`

**Key test change:** Anywhere a test mocks `call_llm` to return a plain string, it now needs to return `LLMResult(content="the string")` instead.

**Pattern:**
```python
# Before:
mock_call_llm.return_value = "response text"
# After:
from lib.llm.result import LLMResult
mock_call_llm.return_value = LLMResult(content="response text")
```

**Step 1: Update all test files**

**Step 2: Run the test suite**

Run: `python -m pytest lib/llm/tests/ -v --tb=short`
Run: `python -m pytest lib/observe/tests/ lib/tune/tests/ lib/eval/tests/ -v --tb=short`
Run: `python -m pytest vario/strategies/tests/ vario/tests/ vario/engine/tests/ -v --tb=short`

**Step 3: Commit**

```
test(llm): update tests for LLMResult return type
```

---

## Task 9: Remove `LLMResultData` alias (optional, can defer)

**After all callers are migrated**, grep for remaining `LLMResultData` usage:

```bash
rg "LLMResultData" --type py
```

If zero results outside `result.py`, remove the alias line:
```python
LLMResultData = LLMResult  # ← remove this
```

This can be done in a follow-up commit or deferred.

---

## Task 10: Final verification

**Step 1: Full grep for `LLMResultStr` — should only appear in `stream.py` (class def + `call_llm_str`)**

```bash
rg "LLMResultStr" --type py
```

**Step 2: Full grep for `.to_str()` — should be zero**

```bash
rg "\.to_str\(\)" --type py
```

**Step 3: Spot-check a few callers work**

```bash
python -c "
import asyncio
from lib.llm.result import LLMResult
r = LLMResult(content='hello world', cost=0.001, usage={'prompt_tokens': 10, 'completion_tokens': 5, 'total_tokens': 15})
assert r.text == 'hello world'
assert str(r) == 'hello world'
assert r.input_tokens == 10
assert r.output_tokens == 5
assert r.total_tokens == 15
assert not r.truncated
assert bool(r)
assert not bool(LLMResult())
print(f'Result: {r}')  # __str__
print('All assertions passed')
"
```

**Step 4: Run any existing llm tests**

```bash
python -m pytest lib/llm/tests/ -v --tb=short
```

**Step 5: Commit (if any stragglers found)**

```
chore(llm): final cleanup of LLMResult migration
```