Extracted from 152 git commits, 141 Claude Code sessions, and handler code analysis. Jan 22 – Feb 8, 2026.
stage_ib returned empty data and marked the item as done — permanently losing it from the queue.RetryLaterError when no data returned. Item stays in queue for retry.stage_ib fell back to creating ephemeral IB connections when stockloader wasn't running. Silently degraded performance and missed pacing constraints.RetryLaterError("Stockloader service not running").in_progress state never got picked up again.in_progress items to pending on runner startup.status='in_progress' but not stage-level in_progress inside the stages JSON column. Items had stages={"fetch":"in_progress"} but status='pending' — invisible to stage workers.in_progress on startup. Fixed 637 stuck items across all jobs.None which runner converted to {} — "success" with no data. Item moved to done with no replay URL, no content.RetryLaterError when no replay URL. Fix handler wrapper propagating None.pause_reason.INSERT OR REPLACE in SQLite does DELETE + INSERT. Columns not in the list (like raw_html) get destroyed. Fetch stored HTML, then extract's INSERT OR REPLACE wiped it.ON CONFLICT DO UPDATE SET col=excluded.col or plain UPDATE.raw.json on disk but no entry in the results table. Extract stage only checked results table.quality_flags: ["possible_dupe_html"].get_queue() called DB functions after conn.close(). Gradio hot-reload made this worse.with closing(get_db()) as conn: context managers everywhere.get_handler_log renamed to get_handler_logger but call sites not updated. asyncio.gather(return_exceptions=True) silently ate all the ImportErrors — nothing worked, no errors shown.asyncio.gather results.{stage}_error) so they survive transitions.fetch_ib_historical_ticks was made async but caller was still sync. Error: "cannot unpack non-iterable coroutine object".stage_ib → process_stage → process().asyncio.gather(return_exceptions=True) silently eats exceptions. Exceptions stored in results list but never logged.isinstance(result, BaseException) entries.BRIGHTDATA_PRIMARY_PROXY + YTDLP_COOKIES_BROWSER=chrome. Raise BotDetectedError.--remote-components ejs:github flag caused JS challenge failures across all 7 yt-dlp call sites.bgutil-ytdlp-pot-provider plugin; player_client fallback chain: android → tv_downgraded → web.retry_later.retry_later, not permanent failure._load_cookies() used sync httpx.get() inside async handler, blocking the event loop._load_cookies() async with httpx.AsyncClient.BRIGHTDATA_PRIMARY_PROXY. Whitelist only localhost, Finnhub, LLM providers.remember_web_* cookie (in Chrome's DB) and an in-memory vic_session cookie (NOT in Chrome's DB). Must exchange via HTTP request.BotDetectedError. Non-zero + stdout has content → debug only.stage_deps in jobs.yaml. Also added inference from list order as fallback.meta (fast, free) → score (LLM) → captions (free) → whisper (conditional).transcript_source.keep_audio=False by default. Auto-detect: _job_has_stage(job, "diarize") keeps audio.enrich stage. Reads cached raw_html from DB. Independently rerunnable.pltr_interviews_2025 duplicated pltr_content_processing discovery. Almost entirely redundant work..manual.vtt for distinction.__init__ isn't thread-safe — AttributeError and corrupted module state.cli() before asyncio.run().cannot import call_llm from partially initialized module lib.llm — multiple workers hit cold start simultaneously.lib.llm in runner.py before spawning workers.try: import litellm except ImportError: pass — litellm is required. Handler silently does nothing if missing.jobs/lib/.overview_table — Gradio auto-polled independently of timer.tick. Duplicate refreshes reset selection state.job param. TypeError at runtime.async def process_stage(*, item_key, data, stage, job).monitor but they're historical backfills. Wrong dashboard tab + wrong metrics.kind: backfill. Rule: finite catalog = backfill.0.0 elapsed seconds is falsy in Python. Stages completing in <1ms treated as missing data.None instead of truthiness.auto table layout ignores width constraints on columns.table-layout: fixed.$SYM AND "earnings". Skip words: shock, breakdown, reaction, crash, moon, squeeze.$SYM prefix or company name.$AA matched $AAL, INTE matched INTC.NOT LIKE '% (%'.resolve_company + add_company)./posts/trending didn't exist. Zero items discovered. Job appeared healthy (no errors)./posts?sort=trending|new. Verify API endpoints before wiring.data:image/[^;]+;base64,[A-Za-z0-9+/=]+ before LLM. Cap at 50K chars.metadata.json. At 1000+ items: thousands of small files, git tracking painful, querying requires globbing.raw_html column in single SQLite DB._cost. Runner logs to job_cost_log. Guard checks daily cost, auto-pauses.asyncio.Semaphore for concurrency control.process() call. No partial progress, no retry per stage.stages JSON column. Each stage progresses independently.max_tokens of 4000/8000 too low for JSON. Truncated mid-response, broke parsing.max_tokens entirely. Use cheaper model if cost concern, not token ceiling.native_web_search=True gives inconsistent behavior across providers. No control over search queries.tools=["web_search"] (Serper-based) for consistent cross-provider web search.temperature=0 for extraction. Default None for generation.log = get_handler_logger("name"). Pass domain data as kwargs.<document>{text}</document>. User prompt contains only the extraction instruction (varies). Document gets prompt-cached.