# Vario Next-Gen: Streams and Things

## Status: Design — not yet implemented

## Context

The block redesign (12 → 6 blocks) simplified the registry but exposed deeper
issues with the data model and execution model.

### Current state

**6 blocks**: `generate`, `annotate`, `critique`, `verify`, `apply`, `synthesize`

**Block signature**: `async def block(items: list, params, ctx) -> list`

**Types**: `Candidate` (content + metadata) and `ScoredCandidate` (candidate + result dict)

**Recipes**: 19 YAML files composing blocks into pipelines.

### Problems

1. **Rigid types** — `Candidate` means "an answer." But pipeline data can be
   questions, sub-problems, analysis, retrieved docs, plans of attack.
   Everything is just data with properties.

2. **Batch, not streaming** — `list` in → `list` out forces the executor to
   run each stage to completion before starting the next. A block like
   "generate plans" should be a stream you can keep pulling from.

3. **Two types for one concept** — `Candidate` vs `ScoredCandidate` is a
   type-level distinction for what's really just "does it have a score prop?"

4. **No provenance** — props say *what* but not *who*. "Score: 85" doesn't
   tell you which block scored it, with which model, at which stage.

5. **Control flow is hardcoded** — `loop` is built into the executor. Retry
   and branching logic would require more special cases.

## Design

### Thing — the universal pipeline datum

```python
@dataclass
class Thing:
    content: str
    props: dict[str, Any]
    history: list[dict[str, Any]]
```

One type. Props accumulate as things flow through blocks. Blocks read the props
they care about, add new ones, ignore the rest.

**Props** are the current state — flat key-value pairs:
- `score` — numeric (0-100)
- `reason` — text feedback
- `verified` — bool pass/fail
- `kind` — what this thing is: "answer", "question", "sub-problem", "plan"
- `model` — which model produced it
- `cost` — USD cost to produce
- `source` — provenance: "llm", "web", "user"
- `converged`, `status` — signals for the executor

**History** records provenance — who added which props, when:
```python
thing.history.append({
    "block": "critique",
    "model": "haiku",
    "stage_id": "reflexion.stage_2.critique",
    "added": {"score": 85, "reason": "..."},
})
```

Stage IDs are paths that nest naturally when recipes compose:
`"outer_recipe.stage_3.inner_recipe.stage_1.critique"`.

**A workflow is itself a Thing.** It has content (its output), props (total cost,
duration, strategy), and contains sub-Things. This is what makes recipes-as-blocks
work — same structure at every level of nesting.

### Streams, not lists

Block signature becomes:

```python
async def block(
    items: AsyncIterator[Thing],
    params: dict[str, Any],
    ctx: Context,
) -> AsyncIterator[Thing]
```

This enables:
- **Lazy evaluation** — don't generate all 5 if the first 2 are good enough
- **Pipelining** — critique can start scoring as soon as the first thing arrives
- **Unbounded production** — "keep generating plans" until downstream says stop
- **Backpressure** — downstream blocks control how much they pull

### Collecting: the `accumulate` primitive

Streaming needs one collecting primitive for when you need a group before acting.

`accumulate` (or `batch` or `collect`) gathers N things from the stream, then
releases them as a group for the next block. Use cases:
- Collect 5 candidates before voting
- Gather all critiques before synthesizing
- Buffer until a timeout or budget threshold

This can be a standalone block or a param on any block (`batch: 5`).
`synthesize` already does this implicitly — its reduce methods need all items
before producing output.

### Two kinds of blocks

Blocks differ in *what does the work*:

**Mechanical blocks** — deterministic software logic, no LLM call:
- `accumulate` — collect N things
- `synthesize(method=top_k)` — sort by score, take top k
- `synthesize(method=majority)` — count answers, pick most common

**LLM blocks** — an LLM operating on content:
- `generate` — LLM produces new content
- `critique` — LLM judges and scores
- `apply` — LLM improves content using feedback
- `synthesize(method=combine)` — LLM merges multiple approaches
- `replan` — LLM looks at the situation and decides what to try next

Both kinds have the same signature and live in the same registry. The distinction
matters for cost/latency reasoning but not for composition.

### The block registry grows, not shrinks

The registry is `dict[str, callable]`. Every useful operation is a block.
The current 6 are a starting set. Future directions:

| Block | Kind | What it does |
|-------|------|-------------|
| `generate` | LLM | Produce new things from prompt + context |
| `retrieve` | mechanical+API | Fetch from web/knowledge base |
| `decompose` | LLM | Split a problem into sub-problems |
| `critique` | LLM | Score and/or provide feedback |
| `verify` | LLM | Step-by-step verification, pass/fail |
| `compare` | LLM | Diff two things, identify gaps |
| `apply` | LLM | Improve content using attached feedback |
| `synthesize` | either | Reduce: select, vote, or LLM-combine |
| `accumulate` | mechanical | Collect N things before passing downstream |
| `replan` | LLM | Assess situation, decide what to try next |
| `annotate` | LLM | Enrich context (→ rename TBD) |
| `checkpoint` | mechanical | Persist state, enable resume |

### Recipes as blocks (composability)

A recipe (graph of blocks) is itself a block. The executor resolves `type:`
against both `BLOCK_REGISTRY` and the recipe library. So `refine` (recipe:
`critique → apply`, looped) can be used as `type: refine` in another recipe.

**Graphs of blocks act as a block.** This is the core composability property.

### Control flow is emergent, not hardcoded

Instead of special control-flow blocks (`loop`, `branch`, `retry`), control
flow emerges from data:

- **Is the exit criterion met?** — `critique` or `verify` checks props.
  Score above 80? Converged? All assertions pass?
- **Was there a failure that needs fixing?** — assessment. Error prop set?
  Score dropped? Empty output?
- **What to do about it?** — `replan`. Fix the prompt. Try a different model.
  Decompose the problem differently.

"Done" is just a prop — `converged: true` or `status: done`. The executor
keeps pulling from the stream until it ends. No special loop/branch/retry
machinery — the LLM decides what to try, blocks signal when they're satisfied.

This is more expressive than hardcoded control flow: the LLM can decide to
do things no programmer anticipated.

The executor still needs *some* structural support (max iterations, budget
limits) as safety rails, but the decision logic lives in blocks, not in
control-flow primitives.

### Vocabulary — shared terms for props

Blocks need a loose shared language so they can find each other's output.
Not enforced schema — just conventions. LLMs are fuzzy enough that slight
misalignment won't break things, but canonical terms help.

**Kind vocabulary** (`props["kind"]`):
- `answer` — a solution/response to a problem
- `question` — something to investigate or ask
- `problem` — a task to solve (the initial input is one of these)
- `goal` — a desired outcome or success criterion
- `plan` — a strategy or sequence of steps
- `sub-problem` — a decomposed piece of a larger problem
- `critique` — feedback on another thing
- `resource` — retrieved reference material (article, doc, example)
- `observation` — a factual finding about the current state
- `hypothesis` — a testable claim

**Assessment vocabulary** (props from critique/verify):
- `score` — numeric quality (0-100)
- `reason` — text explanation of assessment
- `verified` — bool pass/fail
- `rubric` — which criteria were evaluated

**Status vocabulary** (props for flow control):
- `status` — "active", "done", "stuck", "failed"
- `converged` — bool, exit criterion met
- `attempt` — int, how many times this has been tried

**Provenance vocabulary** (props for tracking):
- `model` — which LLM produced this
- `source` — "llm", "web", "user", "mechanical"
- `cost` — USD cost to produce
- `stage_id` — which pipeline stage produced this

This vocabulary lives in a YAML file (like `lenses.yaml`) and can evolve.
Blocks reference it in their docstrings. New terms get added as new blocks
need them.

### Type system: structural, not nominal

Props define what you can do with a Thing. A block that needs `score` works
on any Thing that has a `score` prop — regardless of what else is there.
This is structural typing (does it have what I need?) not nominal typing
(is it a ScoredCandidate?).

Formally, this is **row polymorphism** — blocks are polymorphic over props
they don't care about. Combined with **dataflow** (blocks are processes,
streams are channels) this gives a clean compositional model.

Props only accumulate, never shrink — monotone on a join-semilattice. This
means loops have nice convergence properties: you can detect fixed points
by checking whether props stopped changing.

### Linguistic analogy

Another way to think about the system, useful for building intuition:

- **Nouns** = Things flowing through the pipeline
- **Adjectives** = props — accumulated descriptions (scored, critiqued, verified)
  that tell you what operations make sense next
- **Verbs** = blocks — operations you can perform
- **Adverbs** = params — how to perform them (model: haiku, rubric: [correctness])

## Migration path

### Phase 1: Thing type

1. Introduce `Thing` dataclass alongside existing `Candidate`/`ScoredCandidate`
2. Add `Thing.from_candidate()` / `Thing.from_scored()` converters
3. New blocks use `Thing`, old blocks still work via converters
4. Gradually migrate blocks to native `Thing` usage

### Phase 2: Streaming

1. Change block signature to `AsyncIterator[Thing] → AsyncIterator[Thing]`
2. Executor wires up generator chains instead of collecting lists between stages
3. Add `accumulate` primitive for blocks that need groups
4. Backpressure / early termination support

### Phase 3: Recipe-as-block

1. Executor checks recipe library when `type:` not found in BLOCK_REGISTRY
2. Recipe execution wrapped as a block callable
3. Recursive composition tested (recipe containing recipe)

### Phase 4: Provenance

1. History list on Thing — who added which props, when
2. Stage IDs as nested paths for compositional provenance
3. Provenance-aware debugging/inspection tools

### Phase 5: Emergent control flow

1. `replan` block — LLM assesses situation, decides next steps
2. Executor supports open-ended streaming (pull until `status: done`)
3. Safety rails: max iterations, budget limits, deadlock detection
4. Remove hardcoded `loop` stage type from executor