# Learning + Skillz Unification Design

**Status**: Proposal | **Created**: 2026-02-14

## Problem

The learning system today conflates two concerns:
1. **Generic knowledge store** — DB schema, principles, links, materialization, gap analysis
2. **Session mining** — one specific source (CC transcripts) for extracting observations

Skillz is designed as a separate system with its own storage (`skillz/storage.py`), but its output is the same: verified observations that compound over time. Additionally, CC skills (`~/.claude/skills/`) have no connection to either system — no provenance, no effectiveness tracking.

## Insight

**Learning core** is a domain-agnostic knowledge store with abstraction layers (instances → principles → materialization). **Sources** are pluggable feeders that produce `learning_instances` from different origins. Skillz is just another source — one that actively targets gaps rather than passively mining sessions.

## Proposed Architecture

```
                          Learning Core (generic store)
                          ─────────────────────────────
                          learning_instances
                          learning_attachments (NEW)
                          principles
                          instance_principle_links
                          principle_applications
                          materialization → learnings.md
                          gap_analysis → identifies weak areas
                                ↑               ↑               ↑
                                │               │               │
                    ┌───────────┤       ┌───────┤       ┌───────┤
                    │           │       │       │       │       │
            ┌────────┴──────┐ ┌──┴───────┴──┐ ┌──┴──────┴──┐ ┌──┴───────┐
            │session_mining │ │ skillz      │ │ manual/CLI │ │ code     │
            │(passive)      │ │ extraction  │ │            │ │ review   │
            │               │ │ (active)    │ │            │ │          │
            └──────┬────────┘ └─────┬───────┘ └────────────┘ └──────────┘
                   │               │
                   │               │
            ┌──────▼───────┐ ┌─────▼──────┐
            │ CC skills    │ │ skillz     │
            │ ~/.claude/   │ │ curriculum │
            │ skills/      │ │ & domains  │
            └──────────────┘ └────────────┘
```

## Three Systems, Two Kinds of "Skill"

| System | What it is | Scope |
|--------|-----------|-------|
| **Learning Core** | Knowledge store + abstraction engine | All observations, all domains |
| **CC Skills** (`~/.claude/skills/`) | Imperative how-to guides for Claude Code | Tool/workflow guidance per session |
| **Skillz** (`projects/skillz/`) | Autonomous domain knowledge acquisition | Investment, people, meta-skills |

## Source Types (Feeders)

Each source writes to `learning_instances` with a distinct `source_type`:

| Source | `source_type` | Direction | Tied to |
|--------|--------------|-----------|---------|
| Session mining | `session_reflection` | Passive — mines past sessions | CC skills (applied skill → outcome) |
| Skillz extraction | `skill_extraction` | Active — targets gaps | Skillz curriculum |
| Manual input | `manual` | Human-initiated | — |
| Code review | `code_review` | Event-triggered | PR/diff |
| CC skill application | `skill_application` | Outcome tracking | CC skills |

## Feedback Loops

### Loop 1: CC Skills ↔ Session Mining

```
CC skill applied in session
    → session mining extracts outcome
    → learning instance (source_type=skill_application, skill_id="gradio-layout")
    → if pattern emerges → update CC skill
```

**New field**: `skill_id TEXT` on `learning_instances` — which CC skill guided this work.

### Loop 2: Skillz ↔ Gap Analysis (Active Learning)

```
Learning core identifies weak domain (few instances, low confidence)
    → skillz targets that domain for extraction
    → verified extraction → learning instance (source_type=skill_extraction)
    → domain coverage improves
    → gap analysis updates
```

**Mechanism**: `gap_analysis()` query on learning_instances grouped by `domain_tags`, filtered by `confidence < threshold`. Skillz reads this to set curriculum priorities.

### Loop 3: Learning → CC Skills (Principle Promotion)

```
Multiple instances support same pattern (instance_count > N)
    → principle promoted to active
    → if actionable → becomes CC skill section or new skill
    → materialized into learnings.md for session context
```

## Schema Changes

### 1. New table: `learning_attachments`

```sql
CREATE TABLE learning_attachments (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    instance_id TEXT NOT NULL REFERENCES learning_instances(id),
    type TEXT NOT NULL,        -- screenshot|screencast|pdf|url|html_report|session|data|diff
    storage TEXT NOT NULL,     -- 'file' or 'reference'
    path TEXT NOT NULL,        -- relative to learning/data/attachments/ (file) or full URL/ID (ref)
    label TEXT,                -- "before layout fix", "source article"
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_attachments_instance ON learning_attachments(instance_id);
CREATE INDEX idx_attachments_type ON learning_attachments(type);
```

File convention:
```
learning/data/
  learning.db
  attachments/
    screenshots/    -- UI captures
    screencasts/    -- GIFs/videos
    reports/        -- HTML analysis
    data/           -- CSV/JSON evidence
  .share            -- serves via static.localhost
```

### 2. New fields on `learning_instances`

```sql
ALTER TABLE learning_instances ADD COLUMN skill_id TEXT;  -- CC skill that guided this work
```

### 3. New `source_type` values

Add to the enum: `skill_extraction`, `skill_application`

### 4. Skillz writes to learning.db (no separate storage)

Skillz' `storage.py` becomes a thin wrapper around `LearningStore.add_instance()`:

```python
def record_extraction(skill_id: str, content: str, verification: dict, corpus_source: str):
    store.add_instance(LearningInstance(
        content=content,
        source_type="skill_extraction",
        learning_type="howto",  # or pattern, convention, etc.
        confidence=verification["success_rate"],
        metadata={
            "skill_id": skill_id,
            "verification_count": verification["count"],
            "success_rate": verification["success_rate"],
            "corpus_source": corpus_source,
            "curriculum_level": verification.get("level"),
        }
    ))
```

## What Stays Separate

- **Skillz curriculum** — ordered difficulty, active targeting logic (not in learning core)
- **Skillz sandbox** — execution environment for verification (learning doesn't execute)
- **Skillz domain configs** — corpus sources, extraction prompts per domain
- **CC skill files** — `.yaml` files in `~/.claude/skills/` (authored artifacts, not DB rows)

## What This Enables

1. **Unified evidence chain**: Every observation, whether from sessions, web extraction, or manual input, lives in one queryable store
2. **Cross-source patterns**: A principle discovered from sessions can be verified by skillz extraction
3. **Active learning**: Gap analysis drives skillz acquisition priorities
4. **Skill provenance**: Every CC skill section can cite "backed by N instances from M sources"
5. **Attachment-rich evidence**: Screenshots, PDFs, URLs alongside text observations

## Implementation Order

1. **Add `learning_attachments` table** + CLI (`learn attach ...`) — immediate value for layout reviews
2. **Add `skill_id` to `learning_instances`** — connect CC skills to outcomes
3. **Add `skill_extraction`/`skill_application` source types** — make skillz a learning source
4. **Refactor skillz storage to write to learning.db** — eliminate duplicate store
5. **Build gap analysis query** — enable active learning loop
6. **Serve attachments via static.localhost** — add `.share` to `learning/data/`
