Retroactive Principle Application Study

Generated: 2026-02-20 08:20 PT | 10 coding sessions (26 transcript segments, 72 episodes)

Sessions

Episodes

117

Followed

Violated

New patterns

Instances

Principles Ranked by Mentions

Principle	Followed	Violated	Evidence (sample)
Prioritize High-Fidelity Domain Interfaces Exhaust specialized tools and structured APIs before defaulting to generalized web search or unconstrained crawlers.	17	1	+ The agent used the specialized 'learn' CLI tool to query the database and check embedding coverage rather than manually + The agent used the specialized `learning.cli` and `LearningStore` API to query the system instead of generic file greppi − The agent used generalized web searches for model IDs and pricing instead of checking the Anthropic → Direct documentat
Research Before Building — Broad Upfront Investigation Pays Off Before diving into implementation, spend time researching what's available. 15 minutes of broad research often saves hours of building the wrong thing — or building something that already exists in a	13	0	+ The agent immediately used a specialized 'learning' skill to lookup the specific study requirements and spec before atte + The agent systematically explored the CLI help, stats, and semantic search capabilities before reading the implementatio
⛔ Edit, Don't Replace — Content Is Sacred CRITICAL. When modifying existing files, use Edit for surgical changes. Write replaces the entire file — use it only for new files or when the user explicitly asks to replace.	5	4	+ The agent used the Edit tool for surgical changes to retroactive_study.py across multiple steps (5-8) instead of overwri + The agent used the Edit tool to update the TODO.md file surgically rather than overwriting the entire file. − The tool sequence shows a 'Write' operation at Step 8 for a documentation file that likely already e → Using Edit would − The agent performed five consecutive small edits to the same file (app.py) in steps 3 through 7. → Batching these change
Verify the Source First In any producer-consumer relationship, confirm the producer is actually emitting data before debugging why the consumer isn't receiving it.	7	0	+ The agent checked if the service was running on port 7840 using lsof before assuming the service state or location. + The agent checked the health endpoint of the server (curl -s -o /dev/null -w "%{http_code}" http://localhost:8130/health
Shorten the Truth Distance Eliminate intermediate data relays by having consumers read directly from the producer's primary output.	4	0	+ The agent read directly from the `learning.db` source of truth via the CLI and Python schema instead of relying on the m + The agent first read the local configuration files (~/.myconf/LLM_MODELS.md) to understand the existing schema before se
Commit by Logical Intent Isolate changes into single-purpose commits to preserve a clear, reviewable, and reversible history.	4	0	+ The agent broke the implementation into three distinct tasks (Task 1: Review file model, Task 2: Prompt template, Task 3 + The agent decomposed the implementation into four distinct, single-purpose tasks (installing dependencies, migrating loc
Widen the Search Aperture When progress stalls, deliberately expand your search radius beyond the current layer or approach to escape local minima.	2	2	+ After initial Grep and Glob searches for 'failure' patterns failed, the agent expanded the search to include broader pat + When the user provided a partial/broken URL for 'facebookres', the agent correctly identified the likely repository (fac − After receiving 403 errors on WebFetch, the agent repeatedly tried similar WebFetch calls on the sam → Recognizing the 4 − The agent attempted a glob search for `**/.claude/settings.local.json` starting from the root home d → Narrowing the sea
Structured Over Unstructured — Grab the Best Format First When data exists in multiple formats, always grab the most structured one. Don't OCR a PDF when the same data exists as XML. Don't scrape HTML when there's an API. Don't parse text when there's JSON.	4	0	+ The agent read the integration tests (test_api.py) alongside the source code to confirm the expected request/response fo + The agent fetched the exact pricing table to extract input/output prices per million tokens rather than relying on a sum
Colocate Action with Insight Place actionable controls directly alongside the data they affect to eliminate the distance between insight and execution.	4	0	+ User proposes putting messages into the statusline and pan (panel) where the user is already looking. + After finding the leaderboard URLs, the agent immediately updated the relevant section in `LLM_MODELS.md` to store the i
⛔ Don't Make Users Ferry Data CRITICAL. If you can get information directly, get it. Don't ask users to run commands and report back what they see.	0	3	− The user request explicitly asked the agent to 'Run these commands and report back what's available' → If the system had − The agent attempted to find the 'notify' skill by guessing paths like ~/.claude/skills/notify.md and → Following the pri
Verify Action Never assume an action succeeded; verify it.	2	1	+ After performing searches, the agent verified the output format and content of the results before proceeding to the plan + The agent ran pytest immediately after the edit to confirm the implementation satisfied the test case. − The tool sequence shows a git commit failure (Exit code 1) because the file was likely already stage → Verifying the com
Codify Logical Service Clusters When multiple services function as a single logical unit, codify their lifecycle into a single command or task.	3	0	+ The agent is implementing a 'Subagent-Driven Development' skill which codifies a complex multi-stage workflow (dispatch, + The request seeks to wire spending alerts and reports into a single notification tool/skill interface.
Warm Pools on Initial Intent Initialize expensive resource pools upon first request and replenish them in the background to balance resource efficiency with high responsiveness.	3	0	+ The agent immediately launched a background skill ('n') to handle the retroactive study upon receiving the user's intent + The agent inspected the 'cl' alias which uses 'claude-pool grab' to reduce cold-start latency for new sessions.
Prefer Context-Native Mechanisms Choose the simplest, most direct tool supported by the immediate environment rather than importing workarounds from more constrained contexts.	3	0	+ The agent checked `supervisor/autonomous/notify.py` to see if the new functionality should be an extension of an existin + Checked for existing __init__.py files and local.py to ensure the new module followed the existing project structure.
Look One Layer Out Identify the external owner of a process's lifecycle when diagnosing unexpected terminations or disappearances.	2	1	+ The agent investigated the sidekick's lifecycle and how lib/llm is invoked from hooks to understand the external owner o + The agent checked both the local project directory and the global configuration directory (~/.myconf) to understand the − The agent attempted to read a directory path (/supervisor/watch) as a file, resulting in an EISDIR e → Checking the file
Command Parsimony Fewer commands that adapt to context.	3	0	+ The agent uses a single 'learning' skill to handle a broad request about knowledge retrieval rather than multiple specia + The user is requesting to shorten the skill call from its full name to 'cl', reducing keystrokes and cognitive load for
Why → How → What (Documentation Order) Understanding flows from why to how to what — never the reverse. A walkthrough that lists architecture, CLI flags, and file layouts in three different formats is just "what" repeated. The reader can a	2	0	+ The user request followed the Why (Retroactive study) -> How (Scan past 30 days) -> What (Steps 1-4) structure, which th + The user request explicitly asks for an overview (Why), followed by file-specific instructions (How), and bite-sized tas
Target Empirical Bottlenecks First Never attempt architectural optimizations until measurement identifies the true bottleneck.	2	0	+ The agent ran a python script to measure Store initialization and FTS search time (0.056s and 0.002s) before planning sp + The agent researched whether sidekick calls involve spinning up new Python processes to address the user's latency conce
Verify API Details, Don't Fabricate Never invent API parameters or syntax. Verify against actual sources.	0	2	− The agent attempted to use `python -m learning.cli learn` which resulted in an Exit code 2 because t → Checking the `--h − The agent searched for 'GPT-5.2', 'GPT-5-mini', and 'Gemini 3 Pro' pricing in early 2025, which are → It would have pre
Escalate Decisions, Not Just Errors Some elements may need human judgment — not because they failed, but because the right action is ambiguous. Design an "uncertain" bucket alongside "success" and "failure."	2	0	+ The user's request explicitly asks for a mechanism where 'decisions and open questions that may need user review' are sa + The agent specifically asked about the granularity of decisions and what should pause autonomous work for human judgment
Embed Calibration within Operation Use real-world operational feedback as the primary validation loop when formal testing is high-cost, subjective, or low-frequency.	2	0	+ The request emphasizes using operational feedback (normal work) as the primary validation loop. + User suggests collecting preferences through session replay to tune the history/badging system.
Reserve Formatting for Signaling Use visual emphasis sparingly to anchor the user's scan path on high-value structural elements.	2	0	+ The agent implemented a `type_emoji` mapping to use specific visual anchors for different content types in the Gradio ap + Added a moneybag emoji to the billing tab to anchor the user's scan path to financial data.
research-before-probing	2	0	+ The agent used Grep and ls to locate the 'learning' and 'reflection' modules across the codebase before attempting to an + The agent used 'which' and 'cat' to understand existing infrastructure (llm-fork, cl alias) before implementing the new
Philosophy Shipping implies testing. When you make a change, automatically verify it works - don't wait to be asked. Use background agents for verification so the main conversation isn't blocked.	2	0	+ The agent created `supervisor/autonomous/tests/test_cli.py` and ran it immediately after creating the file (Step 5 & 6) + The agent created a new test file `supervisor/autonomous/tests/test_prompts.py` and ran it to verify the failure before
Research Before Probing When facing an unfamiliar API or library, search docs first — probe the runtime second.	2	0	+ The agent read `supervisor/autonomous/review.py` and existing tests (Steps 2-4) to understand the underlying data model + Checked the installed version of the 'apprise' library (v1.9.7) before writing the wrapper code.
Spec-Driven Development The process:	2	0	+ The agent immediately launched the 'superpowers:writing-plans' skill in response to a request to 'kick off plan and impl + The agent strictly followed the provided Task 4 description, implementing the core.py logic and exports exactly as speci
Draft the Shape, Then Fill It In When building a new module, draft the interface skeleton first — get alignment — then implement.	2	0	+ The agent prioritized the planning phase for the 'watch.api', 'hist', and 'badge' components before attempting implement + The agent read the prompt templates for Implementer, Spec Reviewer, and Quality Reviewer before creating the task sequen
Trim the Root when Branching When modularizing content or logic, replace the original inline details with references to prevent redundancy and overhead.	2	0	+ The plan explicitly includes migrating existing local notification logic into a new module (lib/notify/local.py) as part + The agent implemented a unified notify() function in core.py that routes to specialized modules (local.py, pushover.py),
Tests - Capture behavior exactly - Use pytest for Python - Serve as executable documentation of expected behavior - See `testing.md` for testing philosophy and guidelines	2	0	+ The agent created lib/notify/tests/test_core.py and immediately ran pytest to verify the implementation of the new dispa + Created lib/notify/tests/test_pushover.py immediately after implementing the module and verified with pytest.
Never Shadow Reserved Namespaces Prefix local variables to avoid colliding with reserved environment variables or system-defined names.	1	1	+ The user request explicitly warns against using the bare `tmux` command because it shadows the system binary with a shel − The agent used SESSION_ID and PARENT as local variable names in the shell script. → Using a prefix like WRAPUP_SESSION_I
Steer via Semantic Intent Influence agent behavior by varying instructions at the entry point rather than branching logic throughout the execution path.	1	0	+ The agent responded to the user's semantic feedback about 'applying to errors' vs 'seeing what happened' by modifying th
Treat Review as Calibration Every human intervention in an automated process must be captured as structured feedback to refine the system's future judgment.	1	0	+ The agent immediately invoked the 'learning' skill to capture the XML convention as structured feedback for future sessi
Prefer High-Entropy Delimiters Use rare or doubled character sequences for automation triggers to prevent collisions with dynamic UI elements and data.	1	0	+ The user and agent are codifying the use of XML tags as high-entropy delimiters to prevent collisions during LLM injecti
Separate Collection from Commitment Gather results into a reviewable staging area before taking irreversible action. This gives you a checkpoint where a human (or automated check) can inspect before damage is done.	1	0	+ The user suggests saving decisions to an md file and adding a todo to review, creating a staging area for human checkpoi
Progressive Disclosure Start minimal, expand on demand.	1	0	+ The agent followed the instruction to ask questions one at a time to refine the idea rather than overwhelming the user w
Subsume Validation into Operation Design systems so that normal operational feedback loops serve as the primary validation mechanism instead of isolated synthetic benchmarks.	1	0	+ The user explicitly requested to capture the idea that 'normal work is the best testing/behavior spec', which directly a
Explicitly Qualify Status Values Differentiate between null and zero using explicit placeholders and define the direction of metrics with clear labels.	1	0	+ The agent added logic to handle 'resolved' status and content processing flags explicitly in the dataframe before render
Visual Clarity State should be obvious at a glance.	1	0	+ The user asked if IDs were cut off, and the agent immediately edited the code to ensure the ID column was fully visible.
resolve-structural-ambiguity-before-coding	0	1	− The user stated the experiment and general learning 'got conflated but should not be' and suggested → The agent immedia
Validate Before Bulk, Not After Before running 300+ items or 2+ hours of processing, run a representative sample (5-10 items) and verify the output has what you need. The cost of discovering a missing field after a full run is a ful	1	0	+ The agent read the API server code to understand the expected schema and behavior before writing the full evaluation scr
Avoid Collisions in Temp/Shared Paths Use unique paths when multiple processes might run concurrently.	1	0	+ The agent created a specific directory /tmp/watch_api_eval to store the results of the evaluation rather than using a ge
Fail Per-Element, Not Per-Batch One bad element should not kill the run. Isolate failures, log them richly, continue. Corollary: the error log is a first-class output of the job, not a side effect.	0	1	− The initial bash script attempt failed with an unrecognized argument error, which halted the entire → Using a more robu
Knowledge-Accumulating Search When searching for information across many similar entities (companies, APIs, websites), structure the work so each search enriches a shared knowledge base that makes subsequent searches cheaper and m	1	0	+ The agent expanded the search from just the requested Sonnet 4.6 to include Haiku 4.5, GPT-5.2, Gemini 3, and Grok 4.1 t
Sanity-Check Results Before Presenting When data contradicts common sense or domain expectations, investigate before displaying. A counterintuitive result might be correct — but surfacing it without verification risks eroding trust in all	1	0	+ After getting initial results, the agent performed a second search to verify specific model versions (Grok 4.1) to ensur
Consolidate Synonymous Entry Points Merge fragmented tools with overlapping domains into a single interface to reduce discovery friction and cognitive load.	1	0	+ The user request defines a 'Single entry point for all knowledge operations' (observations, principles, conventions, how
Adopt Before You Build Before building a capability, search for existing implementations. A well-maintained library with community momentum is almost always better than a custom solution — even if the custom one fits perfec	1	0	+ The user explicitly requested to 'keep using the comms library it uses' from the existing AI MCP instead of reinventing
Feature Lifecycle Any unit of work—whether a single feature, a sub-project, or a large capability group—progresses through these explicit states. This status should be tracked (e.g., in `TODO.md` or the Status Dashboar	1	0	+ The agent structured the work into explicit phases: Explore context -> Clarify vision -> Propose design -> Write doc.
Surface Latent System Decay Expose metrics of internal rot and orphan data within the primary interface to transform invisible debt into curation tasks.	0	1	− The tmux session exit is failing to populate the parent shell history with the resume command, which → If the system sur
Prefer Predictable Primitives Standardize on primitives with consistent cross-platform behavior rather than relying on non-standard extensions or environment-specific flags.	1	0	+ The user explicitly instructed to use `/opt/homebrew/bin/tmux` instead of the bare `tmux` command to avoid non-standard
Speculative Parallelism: Trade Compute for Latency When a call is cheap but you're only ~90% sure of the syntax/params, the default approach is: try it, see the error, fix, retry. Each round-trip adds latency.	0	1	− The agent executed 7 sequential `gh api` calls, waiting for each to return before starting the next. → Since the API cal

New Pattern Candidates (20)

Bootstrap via Meta-Documentation

When tasked with a complex project, prioritize reading the project's own 'TODO' or 'Spec' files over general environment exploration.

Evidence: The agent's first step was to read learning/TODO.md to understand the 'Retroactive Principle Application Study' spec.

Anti-pattern: Blindly listing files or grepping for keywords without understanding the project's internal roadmap.

From: Retroactive principle application study. Goal: scan past 30 days of Claude Code

Align Log Levels with Intent

Ensure that the categorization of events (e.g., 'ERROR' vs 'ok') matches the user's analytical goal rather than just the technical exit code.

Evidence: The user complained that the agent was focusing too much on 'errors' when they just wanted to 'see what happened' in the study.

Anti-pattern: Technical-only logging that obscures the narrative flow of a process by over-emphasizing failure states.

From: i dont get it you are applying to errors but we just need to see what happened -

Baseline Before Optimization

Establish a quantitative performance baseline using the current production environment before proposing architectural changes for speed.

Evidence: Step 5: Agent measured execution time of the existing search implementation before entering Plan Mode.

Anti-pattern: Proposing caching or indexing strategies based on perceived slowness without measuring actual latency.

From: ok, want to plan and iterate on this, with first introducing some way to assess

Asynchronous Decision Buffering

When an autonomous agent encounters an ambiguity, it should buffer the decision in a persistent log and continue on non-blocked paths rather than idling for a synchronous response.

Evidence: User request: 'save decisions and open questions to some md file and add a todo to review' instead of stopping the bigger task.

Anti-pattern: Synchronous blocking where an agent stops all progress to wait for a user's answer to a minor clarification.

From: even when user fires of a bigger task, we may want /autonomously as a command an

Plan for the Uninformed

Write implementation plans assuming the executor has zero codebase context and 'questionable taste' to force explicit constraints and file-level mapping.

Evidence: The user request specifically defines this persona for the writing-plans skill.

Anti-pattern: Vague architectural suggestions that lead to implementation drift or 'Kitchen Sink' commits.

From: (skill invocation)

Multi-Stage Review Gating

Separate validation into distinct phases (e.g., functional/spec compliance vs. structural/code quality) to ensure core requirements are met before optimizing for style.

Evidence: The user request explicitly mandates a 'two-stage review after each: spec compliance review first, then code quality review.'

Anti-pattern: Monolithic review where architectural flaws are obscured by discussions about variable naming or linting.

From: (skill invocation)

Sync State Before Action

Always read the current state of a tracking document (like TODO.md) before appending or modifying it to ensure context is current.

Evidence: Agent read TODO.md in step 1 before editing it in step 3.

Anti-pattern: Blindly appending to a file based on cached or assumed state, leading to duplicates or out-of-order logs.

From: on so uou have a plan is it still current and save to learning todos the follow

Scripting Language Pivot

When a shell script encounters complex parsing or argument handling issues, pivot immediately to a high-level language like Python for better error handling and data structures.

Evidence: The agent switched from a Bash script (Step 4) to a Python script (Step 6) after encountering argument errors and needing to handle JSON/HTTP more reliably.

Anti-pattern: Struggling with shell quoting and string manipulation for complex JSON payloads.

From: Run watch.api hook evaluation: 1) Start watch-api server if not running (python

Audit Process-Spawn Overhead

When optimizing for low-latency interactive features, explicitly audit the cost of process initialization versus persistent daemon communication.

Evidence: The user specifically asked if sidekick calls involve spinning up new Python processes each time to understand latency bottlenecks.

Anti-pattern: Optimizing application logic while ignoring the multi-second overhead of cold-starting the runtime environment.

From: ok i would like to improve how /hist can get cached and improved. i want hte sid

Differential History Updates

Update summary or history state only on new user intent (ingress) rather than on every system tick to preserve token budget and focus.

Evidence: User request: 'only on new user messages and against the cached hist'.

Anti-pattern: Regenerating full context or history on every turn regardless of whether the state changed.

From: then there is the prompt for making a great /hist and when to update it and what

Proactive Schema Backfilling

When a new field is added to a record (e.g., 'launch_post'), proactively search for and populate that field for existing records in the same collection.

Evidence: The user asked to 'keep track of launch post for every model', and the agent immediately began searching for launch posts for models not explicitly mentioned in the update request.

Anti-pattern: Sparse Schema Debt where only new entries contain the new metadata, making the field unreliable for programmatic use.

From: ok so updade sonnet to 4.6 in models, and update pricint info and link to its la

Materialize Views for Human Consumption

Maintain a structured database as the source of truth while auto-generating human-readable files (markdown) for easy browsing and search.

Evidence: The system uses learning.db as the source of truth and generates ~/.claude/principles/*.md as materialized views.

Anti-pattern: Manually editing markdown files and trying to parse them back into a database, leading to sync errors.

From: (skill invocation)

Reconstruct Context via Diff

When resuming a session with lost context, use version control diffs and logs as the primary source of truth for 'what was just done' rather than relying solely on natural language summaries.

Evidence: The agent used git diff and git log across two repos to verify the 'Sonnet 4.6' updates mentioned in the summary.

Anti-pattern: Blindly trusting a summary and potentially duplicating work or missing half-finished file changes.

From: (context-restored session)

Cross-Agent Tool Parity

When a capability (like notifications) exists in one part of an ecosystem (MCP), prioritize porting the interface to other agents (Claude Code) to maintain a consistent operational surface.

Evidence: User wants to bring Pushover configuration from AI MCP into Claude Code as a tool/skill.

Anti-pattern: Fragmented capabilities where different agents have different 'senses' or 'limbs' for the same infrastructure.

From: lets work on notifying hte user better from automation and supervision. we have

Multi-Stage Review Pipeline

Separate validation into distinct phases (e.g., Spec Compliance vs. Code Quality) to prevent quality checks from overlooking functional requirements.

Evidence: The user request explicitly mandates a two-stage review: spec compliance review first, then code quality review.

Anti-pattern: Single-pass review where stylistic or technical feedback overshadows the fact that the code doesn't actually meet the original requirements.

From: (skill invocation)

Verify Saturation via Recent SOTA

When identifying non-saturated benchmarks, explicitly cross-reference the benchmark's leaderboard against the performance of the latest frontier models (e.g., GPT-4o, Claude 3.5).

Evidence: The user specifically asked for benchmarks 'not saturated by frontier llms', requiring the agent to look for high headroom in current leaderboards.

Anti-pattern: Recommending classic benchmarks like ROUGE on CNN/DailyMail where frontier models have already hit a performance ceiling.

From: pls identify benchmarks that are not saturated by frontier llms and htat have to

Temporal Constraint Validation

When a request includes a specific date or version constraint, explicitly verify the current system state against that constraint before proceeding with content updates.

Evidence: The agent checked if the data was from Q3 2025 or later before updating the README as requested by the user.

Anti-pattern: Blindly updating text to match a user's requested date without verifying if the underlying data actually supports that claim.

From: in benchmarks readme update the overvie wof rhat we have and what is avails. and

Pattern-Matching Implementation

When adding new entities to a system, identify the existing 'adapter' or 'template' pattern first to ensure architectural consistency.

Evidence: The agent explicitly searched for the 'benchmarks adapter pattern' in step 2 to ensure the new text benchmarks followed the existing structure.

Anti-pattern: Implementing new features as isolated scripts that don't share the lifecycle or interface of existing similar features.

From: want ti install the firs 4 text ones? and get them going the same way we did oth

Map the Evaluation Pipeline

When researching benchmarks, explicitly identify the 'Scorer', 'Dataset Schema', and 'Environment Requirements' as a standard triad.

Evidence: The user request specifically asked for these three components for four different benchmarks, which the agent successfully retrieved.

Anti-pattern: Researching benchmarks by only looking at the leaderboard scores without understanding the underlying scoring logic or data format.

From: Research the evaluation methodology of these 4 benchmarks to understand what cod

Adapter Pattern for Upstream Parity

When integrating external benchmarks, use a dedicated adapter that mocks the external SDKs to route calls through internal infrastructure while keeping the upstream evaluation logic untouched.

Evidence: The use of `_upstream.py` to mock SDKs and clone repos allows `mmlu_pro_official.py` to run official code against internal models.

Anti-pattern: Rewriting benchmark logic manually, which risks losing parity with official results.

From: Explore the benchmarks adapter pattern in the rivus codebase. I need to understa

Notable Instances (57)

[pattern] The agent used a domain-specific tool ('learning' skill) instead of generic 'ls' or 'cat' to retrieve the study spec.

Linked: dev/prioritize-high-fidelity-domain-interfaces | From: Retroactive principle application study. Goal: scan past 30 days of Claude Code

[pattern] The agent performed a sequence of 4 consecutive edits to the same file to refine a single logic block, demonstrating incremental refinement.

Linked: development/simplicity-and-incremental-refactoring | From: i dont get it you are applying to errors but we just need to see what happened -

[pattern] The agent used grep to find method definitions (search, search_all_items) in the schema file after reading the CLI entry point to trace the execution flow.

Linked: dev/look-one-layer-out | From: Explore the learning find/search capabilities in the rivus codebase. Run these c

[pattern] The agent used a one-liner python script to test both FTS and semantic search performance in a single execution.

Linked: development/minimum-necessary-weight | From: ok, want to plan and iterate on this, with first introducing some way to assess

[pattern] The user explicitly prompted the agent to 'capture' a convention, demonstrating a collaborative approach to system calibration.

Linked: knowledge-accumulation/treat-review-as-calibration | From: good and capture the xml convention /learning ? related to retrieval for llms?

[pattern] The agent used a Python one-liner to inspect the database schema and IDs when the CLI output was insufficient.

Linked: dev/prefer-context-native-mechanisms | From: (skill invocation)

[pattern] The user is defining the 'Autonomous Mode' prompt as a behavioral constraint ('just do your best') which aligns with steering via semantic intent.

Linked: dev/steer-via-semantic-intent | From: even when user fires of a bigger task, we may want /autonomously as a command an

[pattern] The agent used existing codebase patterns as the foundation for a new design discussion, ensuring architectural consistency.

Linked: dev/extend-dont-invent | From: (skill invocation)

[insight] User explicitly codifying the philosophy of operational validation into the system's long-term memory/learning.

Linked: testing/subsume-validation-into-operation | From: in learning todo, capture the idea that normal work is the best testing/behavior

[pattern] The agent immediately initiated a planning skill upon receiving a request, demonstrating a bias toward structured execution before the user interrupted.

Linked: development/spec-driven-development | From: (interrupted)

[pattern] The agent verified the location of the 'notify skill' in the plugin cache before finalizing the plan.

Linked: development/verify-action | From: (skill invocation)

[pattern] The agent immediately checked for existing tests using Glob before creating new tasks, demonstrating a 'verify current state' mindset.

Linked: testing/philosophy | From: (skill invocation)

[convention] The agent checked for the existence of the target directory before attempting to write, ensuring the environment was ready for the new skill.

Linked: development/verify-action | From: You are implementing Task 5: Create the `/autonomous` skill at `~/.claude/skills

[pattern] The agent used a specialized 'n' skill to offload a long-running research/analysis task, keeping the main session responsive.

Linked: parallelism/offload-leaf-tasks | From: on so uou have a plan is it still current and save to learning todos the follow

[convention] The agent used a specialized tool `llm-fork` to spawn a new session with specific flags, demonstrating the use of custom environment capabilities.

Linked: dev/prefer-context-native-mechanisms | From: Spawn a fresh Claude or Gemin

[insight] The user's question 'in id col are things cut off?' served as a direct signal of a UI observability failure, prompting an immediate fix.

Linked: observability/ui-observability | From: in id col are things cut off?

[convention] Checking a local conventions file (`.myconf/conventions/servers.md`) to determine port offsets and naming standards.

Linked: dev/prefer-context-native-mechanisms | From: we had a specific learning experiment, with failure pairs etc and general learni

[insight] The agent used port numbers found in infra/Caddyfile to reverse-engineer which services were mapped to which logical names (e.g., 7880).

Linked: dev/look-one-layer-out | From: so cae you met the stuff from learning to reflection? and hwat does the failures

[pattern] The agent followed a 'Test-First' approach by creating the test file and running it (expecting failure or empty results) before modifying the target CLI implementation.

Linked: testing/philosophy | From: You are implementing Task 4: Review CLI commands — add `list-reviews`, `review`,

[convention] The agent proactively checked for existing test patterns in `supervisor/autonomous/tests/test_review.py` before creating the new test file to ensure consistency in testing style.

Linked: testing/test-hierarchy | From: You are implementing Task 3: Completion notifications — `supervisor/autonomous/c

[pattern] Agent performed a 'Red' phase of TDD by running a test that it knew would fail with an ImportError before writing the implementation.

Linked: testing/philosophy | From: You are implementing Task 2: Autonomous prompt template — update `supervisor/aut

[pattern] The agent used a Python script to orchestrate the evaluation, which naturally handled the 10s wait and JSON parsing requested by the user.

Linked: dev/prefer-context-native-mechanisms | From: Run watch.api hook evaluation: 1) Start watch-api server if not running (python

[pattern] Reading integration tests to verify API contract details that might be ambiguous in the implementation code.

Linked: testing/guidelines | From: Find and read the watch API server code, specifically: 1. The main API file at s

[gotcha] User identified a potential 'cold start' penalty in a sidekick/hook architecture where CLI tools are used as library functions.

Linked: parallelism/amortize-orchestration-overhead | From: ok i would like to improve how /hist can get cached and improved. i want hte sid

[pattern] The concept of a 'gym in gyms' for tuning prompt behavior using session replays as the training data.

Linked: testing/subsume-validation-into-operation | From: then there is the prompt for making a great /hist and when to update it and what

[pattern] The agent checked personal configuration conventions (.myconf/conventions/llm.md) in addition to the project code to ensure alignment with broader user preferences.

Linked: dev/research-before-building-broad-upfront-investigation-pays-of | From: add a general todo and advice to lib llm to see if we can use Grok 4.1 Fast (non

[pattern] The agent uses a specialized 'writing-plans' skill to formalize the transition from user intent to technical execution.

Linked: development/implementation-spec | From: ok so want to kikc off plan and implement for watch.api and updated hist and bad

[gotcha] The agent searched for 'Claude Sonnet 4.6' and 'GPT-5.2' in 2025/2026, suggesting it is operating in a future-dated or hypothetical context based on the user's prompt.

Linked: development/verify-api-details-dont-fabricate | From: ok so updade sonnet to 4.6 in models, and update pricint info and link to its la

[convention] The agent checked for the existence of a `.share` directory before writing a new HTML report, showing awareness of environment-specific storage conventions.

Linked: dev/prefer-context-native-mechanisms | From: and do you have time to first token etc benchmarks?

[insight] The agent proactively searched for '2026' pricing to ensure it wasn't looking at stale documentation, demonstrating temporal awareness in pricing queries.

Linked: data-quality/verifying-early-beats-correcting-later | From: can you double check how cheap grok is ??

[gotcha] The agent attempted to find pricing for models that do not exist yet (GPT-5.2), likely due to the user's prompt implying a future timeframe (2025/2026).

Linked: development/verify-api-details-dont-fabricate | From: and how does it to caching to keep costs low? wnat to add to table what cached i

[insight] The user explicitly asked for the 'learning/principle' behind moving links into the table, which directly maps to 'Colocate Action with Insight'.

Linked: ux/colocate-action-with-insight | From: https://watch.localhost/ -- add moneybag emoji to tab, put it last, link to das

[pattern] The system uses a 'Lookup (before work)' phase to check for past learnings, reinforcing the 'Research Before Building' mindset.

Linked: dev/research-before-building-broad-upfront-investigation-pays-of | From: (skill invocation)

[pattern] The agent checked for changes in a global config repo (~/.myconf) in parallel with the project repo to ensure principles and model definitions were synced.

Linked: dev/codify-logical-service-clusters | From: (context-restored session)

[insight] The agent correctly identified that 'real' pricing in a user query about the future (2026) requires checking current official documentation to ground the answer in reality versus speculation.

Linked: data-quality/sanity-check-results-before-presenting | From: what ab`out grok pricing is it real?

[convention] User requested a specific directory structure ('lib or tools? and skill') before implementation began, showing a preference for structural alignment.

Linked: development/resolve-structural-ambiguity-before-coding | From: lets work on notifying hte user better from automation and supervision. we have

[pattern] The agent used TaskCreate to build a roadmap of the brainstorming process before taking any action, ensuring the user can see the planned trajectory.

Linked: development/implementation-spec | From: (skill invocation)

[pattern] The agent checked for the existence of a library (apprise) and its version before planning implementation tasks.

Linked: development/research-before-probing | From: (skill invocation)

[pattern] The agent used TaskCreate to establish a clear, trackable queue of work before starting implementation.

Linked: dev/codify-logical-service-clusters | From: (skill invocation)

[pattern] The agent maps a high-level reflective question ('what did you learn?') directly to a specialized 'learning' skill execution.

Linked: dev/steer-via-semantic-intent | From: what did you learn? what are the learnings?

[convention] The agent uses a heredoc for the git commit message to handle potential special characters and multi-line input safely.

Linked: dev/prefer-high-entropy-delimiters | From: Cleanly finish a session: commit

[pattern] The user request defines a 'Wrapup' skill that explicitly separates the session summary from the exit command, allowing for human review.

Linked: knowledge-accumulation/treat-review-as-calibration | From: Cleanly finish a session: commit

[gotcha] The agent encountered a 'Sibling tool call errored' cascade where multiple Read operations failed simultaneously due to a shared path error.

Linked: development/no-silent-failures | From: Find and summarize the current implementation of watch.localhost in the rivus pr

[pattern] The agent checked existing files (local.py, pushover.py) before writing core.py to ensure the dispatch logic correctly interfaced with existing implementations.

Linked: development/research-before-probing | From: You are implementing Task 4: Create lib/notify/core.py + update __init__.py ##

[convention] The agent checked for the existence of directory structures and __init__.py files before attempting to create them, respecting the user's warning about concurrent tasks.

Linked: development/avoid-collisions-in-tempshared-paths | From: You are implementing Task 3: Create lib/notify/pushover.py — Apprise/Pushover wr

[pattern] User requesting a shorter alias ('cl') for a frequently used skill ('n') to optimize their CLI workflow.

Linked: ux/command-parsimony | From: update tthe /n skill to call cl not full name. also on exit from a tmux sess the

[gotcha] Failure of a 'resume' command to propagate from a sub-shell (tmux) back to the parent shell's history.

Linked: dev/shorten-the-truth-distance | From: update tthe /n skill to call cl not full name. also on exit from a tmux sess the

[pattern] The 'cl' alias implementation demonstrates a 'grab' pattern from a pre-warmed pool to handle expensive resource initialization.

Linked: dev/warm-pools-on-initial-intent | From: Spawn a fresh Claude or Gemin

[pattern] The agent used `cat` on a specific file in `/tmp` to retrieve a resume command, demonstrating the use of temporary files as a state-sharing mechanism between sessions.

Linked: dev/shorten-the-truth-distance | From: Manage tmux sessions from Claude Cod

[pattern] The agent checked git status across multiple directories (.myconf and .local/bin) to ensure all relevant changes were captured before wrapping up.

Linked: development/verify-action | From: Cleanly finish a session: commit

[insight] The agent used a background process for a potentially long-running or non-blocking task (curl/cancel file logic) during the wrapup sequence.

Linked: parallelism/offload-leaf-tasks | From: Cleanly finish a session: commit

[insight] The agent successfully filtered for benchmarks that included both the dataset and the evaluation script/scorer, directly addressing the 'installable' constraint.

Linked: dev/trace-the-chain-to-an-action | From: pls identify benchmarks that are not saturated by frontier llms and htat have to

[pattern] The agent used a 'Task' tool to perform a structured lookup of repository metadata, effectively treating the external GitHub state as a verifiable database.

Linked: dev/prioritize-high-fidelity-domain-interfaces | From: in benchmarks readme update the overvie wof rhat we have and what is avails. and

[pattern] The agent used Grep to find existing configurations for specific benchmarks (arc_challenge, hellaswag) to use as templates.

Linked: dev/extend-dont-invent | From: want ti install the firs 4 text ones? and get them going the same way we did oth

[pattern] The agent used sequential WebFetch calls to build a comprehensive context of multiple repositories before synthesizing the final answer.

Linked: knowledge-accumulation/knowledge-accumulating-search | From: Research the evaluation methodology of these 4 benchmarks to understand what cod

[pattern] The codebase uses a centralized `_upstream.py` to handle the 'dirty' work of repo cloning and SDK mocking, keeping individual benchmark files clean.

Linked: dev/trim-the-root-when-branching | From: Explore the benchmarks adapter pattern in the rivus codebase. I need to understa

[insight] The agent correctly distinguished between 'updated_at' (metadata change) and 'pushed_at' (actual code commit) to answer the user's specific question about commit dates.

Linked: dev/verify-the-source-first | From: For each of the following GitHub repositories, check the last commit date and de