Karajan Code

Multiagent coding orchestrator. 24 roles, 5 agents, deterministic guards, TDD, SonarQube, automated review.

Get Started View on GitHub Public roadmap

Why Karajan Code?

Instead of running one AI agent and manually reviewing its output, kj orchestrates specialized roles — each executed by the AI agent you choose. The coder writes code, guards check for destructive patterns, SonarQube scans it, the reviewer checks it, and if issues are found, the coder gets another attempt. Roles define what to do; agents define who does it.

A loop-engineering runtime

Prompt engineering got one good answer from one good prompt. Context engineering curated what the model saw. The 2026 frontier is loop engineering: you stop prompting the agent by hand and design the system that prompts it, checks it, and decides what happens next — until the goal is met or it hands back to you. Karajan was built around that loop before the term caught on:

Maker / checker split — a coder against independent reviewer, tester and security roles, with Solomon judging disputes. The maker never grades its own work.
Deterministic verification — TDD, per-HU acceptance tests, SonarQube gates and deterministic guards. Checking is tests, not vibes.
The autonomy ladder L1 → L2 → L3 — the interactive | assisted | autonomous axis (v3.7.0). Report, then assisted fixes, then unattended kj autorun — defaulting to interactive, so you opt in.
A durable state spine — sessions, the HU Board, journals, the RAG index and kj resume keep the loop alive across runs.

The caveat loop engineering insists on — unattended loops make unattended mistakes — is designed in: autonomous runs list their residual defects, every story lands behind a PR, and kj-trash snapshots destructive operations. Read the full building-block mapping →

Zero API Costs

Runs on your existing AI subscriptions. No additional API keys or cloud services required. Pair with RTK and Squeezr (see below) for 60-90% extra token savings.

RTK Integration — Bash output compression

RTK (Rust Token Killer) compresses the output of 13 Bash commands (git, ls, find, grep, cat, head, tail, wc, diff, tree, du, file) that the coder agent uses constantly. When rtk --version is detected at preflight, Karajan transparently wraps every supported command via wrapWithRtk() and accumulates byte savings per session via RtkSavingsTracker. Optional, opt-in by install — no config flag needed. See installation docs.

Squeezr-compatible — MCP response compression

Squeezr is an MCP proxy that compresses the responses Karajan’s MCP server returns to the host (Claude Code, Cursor, etc.). It’s architecturally orthogonal to RTK: RTK compresses inside the pipeline (Bash command output); Squeezr compresses above it (MCP messages over the wire). Karajan doesn’t integrate Squeezr — Squeezr sits on the host’s MCP transport — but they stack cleanly. Install Squeezr in your MCP host config and Karajan benefits with zero changes.

Documentation Links in Errors

Preflight, bootstrap, and MCP errors include a See: <url> pointer to the relevant docs page. Specific anchors for SonarQube, Docker, agent install, RTK install, and config issues; other failures fall back to the troubleshooting guide.

Telemetry (Opt-Out)

Anonymous usage statistics (version, OS, command, pipeline duration, success rate) to help improve Karajan. Fully opt-out with telemetry: false in config.

Auto-Detect Stack

kj init scans package.json, go.mod, Cargo.toml and more to detect your framework and language. Auto-enables impeccable for frontend projects.

kj status Dashboard

Terminal dashboard showing HU states, current stage, timing, and progress. MCP returns structured JSON for programmatic access.

kj undo

Revert the last pipeline run with a soft reset or --hard. Safely undo changes when a run produces unexpected results.

HU Board Dashboard

Web dashboard for visualizing HU stories and sessions across all projects. Kanban board, session timeline, quality scores. Docker-ready, auto-syncs from local files.

HU Story Certification

Mandatory quality gate that evaluates user stories across 6 dimensions (JTBD context, user specificity, behavior change, control zone, time constraints, survivable experiment). Detects 7 antipatterns, rewrites weak stories, pauses for FDE context. Supports dependency graphs.

Codebase Health Audit

Read-only analysis across 5 dimensions: security, code quality (SOLID/DRY/KISS/YAGNI), performance, architecture, and testing. Generates a health report with A/B/C/D/F scores per dimension and prioritized recommendations without modifying any files.

5 700+ Automated Tests

530 test files covering every pipeline role (incl. rag-context-stage + tool-correctness judge + TDD-discipline gate), guard, config option, MCP tool (27 total), the full HU Board package (writable settings modal + shared-plan badge + per-HU assignee + canonical statuses API + cross-provider cache badge), the Project RAG subsystem with multi-language AST chunkers (Python, Rust, Go, Java), hybrid mode + rerank + MMR diversification, golden-query recall@k / MRR harness, the retrieval dashboard, the team-shared HU cohort workflow, the semantic test-diet auditor (KJC-TSK-0345), the cross-provider cache observability subsystem (Phase 0 — anthropic/openai/gemini/aider/opencode usage normalization, BudgetTracker breakdown, telemetry computeCachedPct, board badge), and the new cache-friendly prompt layout (Phase 1 — stable/volatile buckets, claude system-split, prefix-stability regression suite). Full suite runs in around 60 s with Vitest. Opt-in subsystems (brain, ci, sonar, hu-board, webperf) are labelled [opt-in: <feature>] so fast feedback loops can skip them via KJ_SKIP_ALL_OPTIN=1. Coverage v8 reports (text + html + lcov) ship as a CI artifact since v2.32 — see KJC-TSK-0465.

Zero-Config Pipeline

Auto-detects TDD based on project test framework. Auto-manages SonarQube Docker lifecycle and config generation. Skips sonar/TDD for infra and doc tasks automatically. Simple tasks run lightweight (coder-only), complex tasks get full pipeline — automatically based on triage.

Skills Mode

8 slash commands (/kj-run, /kj-code, /kj-review, /kj-test, /kj-security, /kj-discover, /kj-architect, /kj-sonar) with built-in guardrails. No MCP needed — works directly in Claude Code.

Host-as-Coder

When the MCP host is the same agent as the coder (e.g., Claude calling kj_run with coder=claude), Karajan delegates directly — no subprocess, no overhead. All guardrails still apply.

Resilient Run

Auto-diagnoses failures and resumes crashed sessions — up to 2 retries. Non-recoverable errors (config, auth, missing agent) fail immediately. Configurable via session.max_auto_resumes.

Standalone Role Commands

Run any pre-build role independently: kj discover, kj triage, kj researcher, kj architect. Available as both CLI commands and MCP tools.

SonarQube + optional SonarCloud

SonarQube (local Docker, blocking quality gates) runs by default and powers the static analysis stage. SonarCloud is opt-in and complementary — enable via --enable-sonarcloud flag, enableSonarcloud: true (MCP), or sonarcloud.enabled: true in kj.config.yml. Requires sonarcloud.token and sonarcloud.organization (or KJ_SONARCLOUD_TOKEN / KJ_SONARCLOUD_ORG env vars). When both run, SonarCloud results are advisory.

Impeccable Design Audit

Automated UI/UX quality gate. Audits changed frontend files for accessibility, performance, theming, responsive, and anti-pattern issues. Runs after SonarQube, applies fixes automatically.

Deterministic Guards

Output guard blocks destructive operations and credential leaks. Perf guard catches frontend anti-patterns. Intent classifier pre-triages obvious tasks without LLM cost. All configurable with custom patterns.

Pre-Execution Discovery

kj_discover analyzes tasks for gaps before coding begins. 5 modes: gap detection, Mom Test questions, Wendel behavior change checklist, START/STOP/DIFFERENT classification, and Jobs-to-be-Done generation.

BecarIA Gateway

Full CI/CD integration with GitHub PRs as source of truth. All agents post comments and reviews on PRs. Early PR creation, configurable dispatch events, and embedded workflow templates.

Real-Time Monitoring

Stall detector, continuous heartbeats, max-silence guardrails, planner runtime cap. kj-tail for colorized live log. kj_status for parsed status.

Intelligent Reviewer Mediation

Scope filter auto-defers out-of-scope reviewer issues instead of stalling the pipeline. Deferred issues tracked as tech debt and fed back to the coder.

Solomon — Pipeline Boss

Evaluates every reviewer rejection, classifies issues as critical vs. style-only, and can override style-only blocks. 6 rules including scope guard, reviewer overreach, and smart iteration control.

Preflight Handshake

kj_preflight requires human confirmation of agent assignments before execution. 3-tier config: session > project > global.

Rate-Limit Standby

Auto-detects rate-limit / quota messages from Claude / Codex / Gemini CLIs and HTTP 429/5xx errors. Parses cooldown when the message uses a recognised format (ISO timestamp, Retry-After: <seconds>, retry in N minutes, or Claude’s resets at YYYY-MM-DD HH:MM UTC) and waits exactly that long with 30 s heartbeats — even if it’s hours away. When no time is parseable, falls back to 5 min default with exponential backoff (cap 30 min) and up to 5 retries before asking a human.

Pipeline Tracker

Cumulative progress view during kj_run — see which stages are done, running, or pending in real time via MCP and CLI.

Plugin System

Extend with custom agents via .karajan/plugins/. Auto-discovered at startup.

TDD Enforcement

Test changes required when source files change. The pipeline rejects iterations without matching tests.

MCP Server

27 tools exposed via MCP — including kj_discover, kj_triage, kj_researcher, kj_architect for standalone role execution, kj_preflight for human-confirmed agent config, kj_board for HU Board management, kj_status for live parsed status, kj_undo for reverting pipeline runs, and kj_rag_query / kj_rag_index for semantic search over the project. Real-time progress notifications for all tools. Graceful restart after npm updates.

5 AI Agents

Claude, Codex, Gemini, Aider, and OpenCode. Mix and match — use Claude as coder and Codex as reviewer, or any combination. Extensible via plugins.

Multi-Agent Pipeline

24 configurable roles spanning pre-loop, iteration and post-loop phases — triage, planner, coder, reviewer, sonar, solomon, audit and more. Full catalogue in Pipeline roles. Mandatory audit post-approval ensures generated code is certified clean before completing.

Solomon — AI Judge (v2.0)

Refined from pipeline boss to AI judge. Consulted only on genuine dilemmas: security-vs-deadline, conflicting quality gates, stalled loops, risk evaluation. Security issues bypass Solomon deterministically and go straight back to the coder.

Karajan Brain (v2.0)

AI-powered central orchestrator that routes all role-to-role communication, enriches feedback with file hints, verifies outputs via git diff, executes direct actions (npm install, gitignore), and compresses role outputs for 40-70% token savings. Consults Solomon only on genuine dilemmas.

Executable Acceptance Tests (v2.4)

Each HU carries acceptance_tests: an array of shell commands Brain runs after every coder iteration. All pass → HU approved. Any fail → Brain reads the exact error output and sends a concrete diagnostic to the coder. No reviewer. No generic tester. Concrete pass/fail.

Budget: With KJ vs Without KJ (v2.6)

At session end, the budget display projects the cost you would have paid without Karajan’s compression and token savings (RTK + Brain). Clear -88% delta lines keep expectations grounded in real numbers.

Rich Session Journal (v2.6)

Every run writes .reviews/<session>/decisions.md, iterations.md, summary.md, and tree.txt. You get a per-iteration log of coder/reviewer/sonar/Solomon steps, an executive summary with a stages table and budget breakdown, and a directory-grouped view of every file touched during the pipeline.

Valibot Config Validation (v2.6)

Config is now schema-validated at load time with Valibot. review_mode typos, max_iterations: 0, out-of-range hu_board.port, negative max_budget_usd, or budget.warn_threshold_pct outside 0-100 fail fast with a readable message. CLI falsy overrides (--no-rebase, --reviewer-retries 0) finally work. Co-authored with Jorge del Casar.

Infrastructure Dependency Injection (v2.6)

FileSystemService and CommandRunner adapters live under src/infrastructure/. BaseAgent takes an optional Environment; createAgent(…, env) threads it through. Tests swap in MockFileSystem + MockCommandRunner via buildMockEnvironment() so every agent path (Claude, Codex, Gemini, Aider, OpenCode) is unit-testable without spawning real subprocesses.

Modular Orchestrator (v2.6)

src/orchestrator.js shrunk from a 2 084-line monolith to a 22-line public barrel over src/orchestrator/flow-runner.js. New StageExecutor contract (canRun / execute / onFailure) plus a StageRegistry lets future stages register themselves without touching the core. Adding a new stage is now a drop-in: put a StageExecutor subclass under src/orchestrator/stages/, register it, done.

addyosmani/agent-skills (v2.7)

First-source process skills from addyosmani/agent-skills: TDD, code-review-and-quality, security-and-hardening, performance-optimization, git-workflow-and-versioning, CI/CD, debugging, spec-driven-development, and more. Auto-cloned to ~/.karajan/agent-skills/, refreshed weekly via git pull. Role-aware: each Karajan role (tester, reviewer, security, architect, coder…) receives the workflows that match its job. Fully orthogonal to OpenSkills — process skills and stack skills compose.

Audit Reports + Token Cost Transparency (v2.9)

--report-file <path> persists the audit to .md (with reproducibility header: timestamp, branch, commit, invocation flags) or .json. $KJ_AUDIT_REPORT_DIR for CI defaults. Every audit ends with a ## LLM Usage section showing provider + model + duration + tokens (in/out/total) + estimated cost in USD. Visible in stdout, JSON, and persisted reports. CLI/MCP parity bug fixed — both paths now drive the same AuditRole flow.

Stack-Aware Audit (v2.9)

detectProjectStack feeds the LLM auditor what kind of project it’s looking at: frontend-only, backend-only, fullstack, language, frameworks. Heuristics get filtered — no more N+1 query nags on Astro sites, no more bundle-size nags on Express APIs. New accessibility dimension auto-activates for frontend / fullstack projects with WCAG 2.x checks (alt text, labels, ARIA, focus management, contrast hints). New WebPerf section with 10 frontend-perf patterns when no live CWV measurement is available.

Three Deterministic Security Collectors (v2.9)

SonarQube findings as ground truth in the prompt (rule ID + line precision). OSV-Scanner integration covers CVEs across the entire OSV.dev DB — broader than npm audit, no account, no upload. Semgrep SAST catches XSS, SQLi, taint flow, hardcoded secrets, language-specific anti-patterns — equivalent to snyk code but free for OSS. All three are best-effort: missing binary or unreachable host silently skips the section.

Two-Phase Audit (v2.9)

kj audit now collects deterministic findings (basalCost, Sonar, OSV-Scanner, Semgrep, WebPerf, stack detection) in parallel — zero tokens — and prints them BEFORE asking Continue with LLM analysis? [y/N]. New --deterministic-only flag for zero-token runs, -y/--yes to auto-confirm, --json bypasses the prompt for pipeable output. CI / non-TTY paths auto-confirm — zero behaviour change for pipelines.

HU Board Hardening (v2.10)

Default bind is now 127.0.0.1 (was: all interfaces). New --bind 0.0.0.0 for the explicit LAN-exposure case, with auto-generated token at ~/.karajan/hu-board/token (mode 0600). Auth middleware enforces the token only for non-loopback peers — same-machine browser keeps working without ?token=. helmet headers + express-rate-limit 300 req/min on /api. Three accepted carriers: Authorization: Bearer, ?token=, kj_board_token cookie.

Webperf Quality Gate (v2.10)

PerfStage slots into the iteration loop right after Impeccable when pipeline.perf.enabled is true. Wraps Lighthouse for a Core Web Vitals verdict per iteration. PASS continues; FAIL pushes blocking-metric feedback (e.g. LCP=5500 (poor>4000) plus top opportunities like render-blocking resources) back to the coder for the next iteration; scanner unavailable skips best-effort. CLI: --enable-perf. MCP: enablePerf. No retry-loop — max_iterations is the natural ceiling.

SKILL.md per CLI Command (v2.10)

docs/agents/SKILL.kj-{plan,run,audit,doctor,init,board,review,resume,clean}.md — one fetch per CLI capability (~ 2-4 KB tokens each), all under the same contract: What it does · Inputs · Outputs · Constraints · Side effects · Common failure modes · Example · Related. CI-guarded: every link in llms.txt must resolve to a file with all four required sections, or the build fails.

Agent-Readiness Score (v2.10)

kj audit --agent-readiness scores any repo 0–100 across 7 LLM-free checks: llms.txt presence + validity, robots.txt AI-bot allowlist, per-doc token budget (≤ 32 KB), heading hierarchy (markdown + HTML <h1>), docs/agents/README.md entry point, SKILL.md coverage. Pure data transformation — no network, no LLM, no side effects. --json for CI. Karajan-on-Karajan: 100/100. Run it on your own repo, see what agents struggle with, fix from the top-fixes list.

hu-board: Ephemeral-Project Cleanup + Help (v2.11)

On board start, projects whose id matches tmp_* / test_* / demo_* / kj-test-* AND have been inactive for >24 h are cascade-deleted (project + stories + sessions). Per-project override via a 3-state toggle on each card (🧪 force-test / 📌 pin / · default heuristic) and PATCH /api/projects/:id/is-test. The header also gained a ? button: opens a modal explaining each of the five views (Board / Graph / Dashboard / Sessions / Pipeline), and every nav tab carries a native title for the standard 1-second hover tooltip.

Dogfooding-Hardened (v2.11)

A two-day, ten-level dogfooding pass through every Karajan surface — from kj --version to a full plan-driven multi-HU sub-pipeline — fixed three latent bugs that only surfaced on fresh /tmp repos: the SonarStage no longer burns max_iterations looping on Missing git remote.origin.url, commitAll tolerates the locale-specific “nothing to commit” race, and the HU sub-pipeline now branches off master/HEAD when the configured main doesn’t exist. runFlow seals session.status at the boundary, so kj status never shows a zombi running run again. All N0–N8 levels re-validated green.

Coder fs-leak detector, second layer (v2.14)

The original fs-leak-detector snapshot-diffed $HOME before and after the coder ran. It caught the original incident (cd /home/manu/assistant && pnpm init creating 36 MB outside projectDir) only because ~/assistant was new. If the target dir pre-existed, the snapshot diff missed it. v2.14 adds detectTranscriptCdLeaks() as a second layer: it scans the coder’s transcript for cd <abs-out-of-project> && <write-cmd> patterns and flags them regardless of disk state. Write commands recognised: mkdir, touch, cp, mv, git init, {pnpm,npm,yarn} init/create, npx create-*, cat >, echo >, shell redirects. Pure-read commands (ls, which, grep) don’t flag, and /tmp is exempt by convention.

Solomon no longer rubber-stamps security blockers (v2.14)

Rule 6 of the Solomon rules engine (reviewer_style_block) used to classify any blocking issue with severity low/minor or matching cosmetic keywords (name, format, documentation, …) as “style” — even legitimate security blockers got passed through. v2.14 adds an anti-classifier: severities critical/high/blocker/major, categories security/correctness, and a security-keyword regex (SQL injection, XSS, CSRF, auth, password, secret, hash, traversal, …) all force-disqualify an issue from being “style”. 6 regression tests cover the false-positive cases from the original incident.

Planner self-fix loop (v2.14)

The plan-reviewer used to be flag-only: it surfaced missing HUs, missing dependencies and scope overlaps, then left them for the user to apply by hand. v2.14 closes that loop. After the first review pass, the new plan-fixer.js module asks the planner to PATCH the plan (additions / deps_to_add / deletions), applies the patch in-process via addHu / removeHu / blocked_by mutations, and re-reviews. Loops up to 2 iterations or until the reviewer reports zero issues. Opt-out via --no-plan-fixer / --quick. Combined with three planner prompt fixes (scope respect, transversal one-to-many deps, explicit reuse marker), the four pathologies that the GRETA Plan 2 dogfooding kept surfacing are now closed at the source.

Team guardrails — recommended setup

Drop-in config for a team that uses AI in its workflow: multi-account SSH (one key per identity), global git hooks (commit-msg blocking AI attribution, pre-push blocking direct push to main, git-secrets for credential scanning), per-agent permission policies (Claude Code, Codex, Codex rules, Gemini CLI), GitHub branch protection rulesets, PR / Issue templates and CODEOWNERS routing. Paste, adapt, deploy. → Read the guide

Team-shared HU Board (v2.31)

Multiple machines, one plan. kj plan share <planId> opts a plan into the .karajan-shared/ cohort: the loader merges shared HUs into the per-project plan, the board scans the cohort and surfaces them with a shared badge, and a new per-HU assignee field lets each runner claim its slice without trampling the others. Selective --only / --exclude filters, kj plan unshare round-trip, and a sharedConflictPolicy escape hatch (local-wins / shared-wins / error) cover the conflict edge cases. Seven PRs (#859–#865) close the team-shared prerequisite (KJC-PRP-0002).

AI Harness Scorecard hardening (v2.32)

Plan A of KJC-PCS-0051 closes five FAILs from the external scorecard audit in a single sprint. Prettier --check (PR #868) blocks PRs whose formatting drifts. Coverage v8 (PR #870) emits text + html + lcov and uploads coverage/ as a CI artifact with per-glob thresholds enforced when opted in. Conventional Commits (PR #872) gates PR commit messages with wagoid/commitlint-github-action@v6 on top of the local pre-commit hook. Nightly drift workflow (PR #873) re-runs the full CI suite at 04:17 UTC daily and auto-files a tracking issue on red. eslint-plugin-security (PR #874) hard-bans eval, new Function, dynamic require, pseudoRandomBytes and mustache-escape disabling. Plus two bug fixes shipped alongside.

AI Harness Scorecard golden metric (v2.33)

Plan B of KJC-PCS-0051 turns kj audit into a quality measurement loop with a single golden number. Docker bootstrap (PR #877, KJC-TSK-0470) auto-pulls addyosmani/ai-harness-scorecard and runs a one-shot scan in ~10 s. Audit integration (PR #878, KJC-TSK-0471) splices the deterministic 0–100 score and A–F grade into the audit report headline. History DB (PR #879, KJC-TSK-0472) persists every run to a per-project audit-history.db (SQLite + WAL, PRAGMA user_version=1). Diff + trend sparkline (PR #880, KJC-TSK-0473) renders the delta vs the previous baseline plus a Unicode-bar trend over the last N runs. One golden number for “how AI-friendly is this repo today vs last week,” zero LLM tokens spent.

Multi-language RAG + Quality & Observability (v2.34)

Two epics close in one window. KJC-PCS-0052 Multi-language RAG adds first-class AST chunkers for Python, Rust, Go and Java via vendored web-tree-sitter WASM grammars (SEA-safe), wires a language adapter registry, extends the watcher and kj onboard / kj audit to multi-stack repos, and ships kj rag index --since <ref> for incremental git-diff-based reindex with a post-merge hook + pre-run drift check. KJC-PCS-0053 RAG Quality & Observability introduces a golden-query harness (kj rag eval) with recall@k + MRR scoring, content-hash sha256 dedup that skips re-embed on unchanged chunks, MMR diversification in the retriever (λ=0.5), and a deep-dive expansion of docs/RAG.md. Seventeen PRs.

How-To: ship your first feature

A new step-by-step interactive guide that walks you from a clean machine to a merged PR. Common setup (kj init, kj doctor) anticipates the usual papercuts — missing Docker daemon, no agent CLI, busy port :4000, Ollama not running — and a stack picker routes you to a focused recipe for Node CLI, Python, Web Component, REST API (Node), Java, Go or Rust. Each stack page is self-contained: bootstrap the project from zero, run Karajan on it, observe the trace on the HU Board, merge. Start the walkthrough →

Quality gates + housekeeping (v3.1)

First minor on the v3 line. Two new quality gates land in the pipeline: tool-correctness judge (KJC-TSK-0375) extracts tool calls from the coder’s transcript and scores whether the right tools were used for the job; TDD-discipline (KJC-TSK-0398) verifies tests were written before the implementation via a surgical stash + diff inspection. Three new housekeeping commands: kj clean --repo (stale branches, dist, tmp), kj clean --vector-stores (orphan RAG indexes) and an --all paraguas with docs/CLEANUP.md. kj sync --apply closes the SPDD loop by writing the canvas drift patch with backup. A new semantic test-diet auditor verifies the 498-test suite has 0 loss-of-meaning findings (npm run audit:test-diet). HU Board structural refactor (17 PRs) + canonical statuses API. No breaking changes — drop-in upgrade from v3.0.0.

Cross-provider cache observability (v3.3)

The Phase 0 epic (KJC-PCS-0056) closes the cache-metrics blind spot end-to-end across Anthropic, OpenAI/Codex, Gemini, aider and opencode. Provider-specific cache fields (cache_read_input_tokens, prompt_tokens_details.cached_tokens, cachedContentTokenCount, cached_tokens) are normalized in BudgetTracker, surfaced as a 🎯 N% badge on the HU Board, persisted in board.db and emitted in pipeline_complete telemetry. Real data on a Karajan repo: cold→hot Claude run dropped from 47.2% to 94.3% cache_pct and from $0.6141 to $0.1452 (−76.4%). Null-safe: badge hides when unmeasured. Drop-in upgrade from v3.2.0.

Cache-friendly prompts (v3.4)

The Phase 1 epic (KJC-PCS-0057) restructures every Karajan prompt into a stable block (identical across iterations and HUs, rendered first so automatic prefix caching hits on it) and a volatile tail. On Claude the stable block ships via --append-system-prompt, so the CLI’s cache breakpoints serve it from cache on every call. Measured: cold-run cache_pct jumps from 47.2% to 99.60% and coder cost drops 76% ($0.61 → $0.14 per HU). A prefix-stability regression suite freezes the contract in CI. Drop-in upgrade from v3.3.0.

Never ships broken (v3.4.2)

A pre-publish gate packs the real npm tarball, installs it clean and isolated, and actually runs it (kj --version / --help) before every publish — wired as prepublishOnly and as the pack-smoke CI job on every PR. The test suite proves the linked workspace; this proves the artifact users receive. Born from three releases that shipped unable to run -v; now the publish aborts itself if the tarball doesn’t start. Drop-in upgrade from v3.4.0.

Quality harness — kj harden + kj check (v3.5)

kj harden brings the guardrails Karajan was built with to any repo in one command: idempotent git hooks, lint/format/commit config, CI quality gates and agent guidelines — all stack-aware and behind kj:managed markers that never overwrite your content. Native per-language hook commands (go vet/ruff/npm…) mean hardening a Go, Python or Java repo never makes Node a commit-time dependency; the commit-msg hook is pure POSIX (Conventional Commits + AI-attribution block). A fullstack monorepo gets each language’s config inside its own folder. kj check verifies the harness as a CI drift gate (exit 0/≠0, --json), and kj init installs it out of the box. Drop-in upgrade from v3.4.2.

Advisory harden (v3.6)

kj harden learns to compare instead of just install. kj harden --report reads an existing repo and shows, per artifact (editorconfig, eslint, prettier, commitlint…), whether it’s missing, yours, or kj-managed — and for your own config, the concrete improvements the kj standard would add. kj harden --interactive then lets you adopt it piece by piece, default-safe (it keeps yours unless you say otherwise). Scope control (--only/--exclude, demo/fixture dirs ignored by default) keeps it off your examples and sub-tools. Drop-in upgrade from v3.5.1.

Autonomous delivery (v3.7)

A single autonomy axis — interactive | assisted | autonomous — turns Karajan into a hands-off squad. kj autorun <spec> chains plan → run → outcome report atomically: it plans the work, decomposes it into HUs, and runs to completion with no human in the loop. The Arbiter resolves agent conflicts by picking the least-bad answer against a fixed ground-truth order (acceptance tests > reviewer must-fix > nice-to-have). Autonomous stages never block on a prompt, a wall-clock backstop guarantees they can’t hang, and the run ends with a DELIVERED / INCOMPLETE report listing any residual defects. The default stays interactive — your normal runs are untouched. Drop-in upgrade from v3.6.0.

Recent releases

The most recent releases, newest first. Older versions live in Architecture › History and the CHANGELOG.

v3.14.1 — parallel lanes work end to end (current)

Patch. --parallel now actually parallelizes — inside real worktree lanes. The 3.14.0 flag silently degraded to sequential (the plan→batch conversion dropped the scope field the scheduler pairs by), and lanes shared the main working tree. Now the coder, acceptance tests, diffs, sonar and the final commit all run INSIDE each lane’s worktree on its own branch, with per-lane config and session state — verified with a live end-to-end run: two HUs coding concurrently in their worktrees while the main tree stays parked. Also fixed: kj plan ready rejected every generated plan (the auto-injected preflight HU shipped without a status), and unattended --yes runs stopped cold on questions that declare a safe default. Drop-in upgrade from v3.14.0.

v3.14.0 — step mode and governed parallelism

Minor. Two new ways to run the orchestra. kj run --step pauses after every iteration with a compact report — what happened, the reviewer’s must-fix list, what the next iteration will do, spend vs cap — and asks: continue, stop, or type instructions that the coder reads next iteration. And kj run --parallel <n> runs a plan’s HUs concurrently, each in its own git worktree, scheduled over the dependency graph with conservative scope isolation and a plan-level budget ceiling. Parallel execution — which previously ran unbounded — now defaults to 1 (fully sequential) and is strictly opt-in. Drop-in upgrade from v3.13.1.

v3.13.1 — every run ships with a spend ceiling

Patch. A stuck or runaway pipeline can no longer drain your subscription quota unattended: max_budget_usd now defaults to 5 (USD-equivalent, checked every iteration). Exceeding it stops the run with an honest summary — what was spent, the limit, how to raise it — and kj resume continues consciously with a fresh window. Set it to null to remove the cap. Born from a field report of a stuck full-orchestra run burning a user’s whole quota. Drop-in upgrade from v3.13.0.

v3.13.0 — `kj harden` respects your own tooling

Minor. A repo that formats and lints with Biome no longer gets kj’s eslint + prettier planted next to it. Harden now understands cross-tool alternatives: with biome.json present, the advisory reports those artifacts as covered (“kj won’t add a second linter/formatter”), config seeding skips them, and kj check stops flagging their absence as drift. Your own config of the same tool still wins, .editorconfig and commitlint keep seeding (Biome doesn’t replace them), and the generated pre-commit hooks and Quality workflow delegate to your project’s npm run lint/format scripts. Drop-in upgrade from v3.12.3.

v3.12.3 — the standalone binary carries its templates, `kj update` updates the kj you run

Patch. Three field-reported fixes. The standalone binary now ships its built-in templates as embedded assets — kj init installs skills and role prompts on a fresh machine instead of warning ENOENT. kj update is channel-aware: on binary installs it re-runs the installer instead of npm-installing a copy your PATH never resolves, and on every channel it now verifies that the kj you actually run reports the new version — failing loudly and naming the shadowing copy when it doesn’t. And the osv-scanner go install recipe points at the real cmd/ subpackage, so kj install-tools can install it again. Drop-in upgrade from v3.12.2.