MVP now in closed beta · Infinite Labs design partner

AI craftsmanship, measured.

Scoring for developers who live in the terminal. Built for Claude Code and every CLI agent that comes next — opencode, codex, any OTEL-emitting tool. Rewards how well you use AI, not how much. No prompt content stored. Team aggregates by default.

Start freeSee how it works →

Free for everyone during MVP · Team Pro + Enterprise coming soon

Not sure where you land? Take the 10-question craftsman quiz →

~/dev/your-project

scoreflow.dev/u/diegotorreslopez81

Not connected
Level1Karpathy

0 XP10 to Lv 2

Skills fill in as you work. 11 tracks, each 1-100.

Achievements

0 / 22

Streak

0d

Works with the stack you already use

Claude Code CLI

Native integration

Claude Desktop

Client

VS Code

Client

opencode

BYOO OTEL

OpenAI Codex CLI

BYOO OTEL

Gemini CLI

BYOO OTEL

Goose (Block)

BYOO OTEL

Copilot CLI

BYOO OTEL

OpenTelemetry

BYOO OTEL

npm

Distribution

GitHub

Distribution

Get started in 60 seconds

Two commands. No config files.

Install the CLI, run scoreflow init once, and a Stop hook wires into Claude Code. No Docker. No sidecar. No SDK to import.

~/dev
$ npm i -g @scoreflow/cli
$ scoreflow init
 Signed in as diego@pdata.org
 Stop hook installed at ~/.claude/settings.json
 You're live. Keep working — your first score lands Monday.
1

Install the CLI

Single npm package. Works on macOS, Linux, and WSL.

2

Run scoreflow init

One-time setup: OAuth to your team and wire the Claude Code Stop hook.

3

Keep working

Every prompt + tool call flows to Scoreflow. First score drops Monday.

Or install as a Claude Code plugin

One line inside Claude Code adds the MCP server + four slash commands (status, leaderboard, profile, explain).

/plugin marketplace add 8infinitelabs/scoreflow
/plugin install scoreflow@scoreflow

Prefer opencode? codex? aider? Any OpenTelemetry CLI connects with three env vars — see the BYOO OTEL guide.

Grab your CLI token →

How it works

Three moving parts. Nothing you can game.

Measurement that engineers respect because it's rooted in their real tooling, not a survey.

01

Ingest the signal, not the content

OpenTelemetry from Claude Code / Copilot / Cursor, GitHub webhooks for PR outcomes, admin APIs for per-user attribution. Event metadata only — never prompt or output text.

02

Score with bands, not monotones

Four components — Effectiveness, Craftsmanship, Outcome, Breadth — each using sweet-spot bands so gaming the metric collapses the score. Weekly snapshot per team + rolling 28-day windows.

03

Deliver where devs live

MCP server so devs see their score inside Claude Code. Weekly email digest for managers. Web dashboard for deep dives. Live CLI hooks that show XP + level-ups in the terminal.

The Craftsmanship Score

Four components. Each one rewards judgment.

Weights are published. Bands are documented. If a signal can be gamed, the band punishes the gaming so the score doesn't.

Effectiveness

w 0.35

Do the completions land?

Tool acceptance rate, immediate-reject signal, regeneration ratio — banded so spamming "accept" collapses.

tool_accept_rateimmediate_rejectregeneration_ratiomodality

Craftsmanship

w 0.25

Is the process thoughtful?

Plan-mode ratio, cache hits, rules-file freshness, distinct skills — rewards deliberate AI use over spray-and-pray.

plan_mode_ratiocache_hit_ratiorules_freshnessskills_used

Outcome

w 0.25

Does the work ship and stick?

PR merge rate, CI pass-rate-first-push, 30-day durability, change-failure proxy. Real outcomes, not vanity volume.

pr_merge_rateci_first_passdurability_30dchange_failure

Breadth

w 0.15

Is AI adoption broad?

Distinct AI tools, modalities (autocomplete + chat + agent), repos touched, surfaces used — breadth without the gaming surface of volume.

distinct_toolsdistinct_modalitiesrepos_with_aisurfaces

Your character sheet

Level up as you ship. Respectful. Earned.

Every profile is a living character sheet: 10 tiers from Karpathy to Turing, 11 skill tracks, 22 achievements. XP derives from CCS so the bands stay anti-gaming. Share it, compare it, unlock the next tier.

Tier 3 · Hassabis

@diegotorreslopez81

scoreflow.dev/u/diegotorreslopez81

27Level

1,245 XP · 🔥 14-day streak62% to 28

Skill tree

File Craft384 XP
Research218 XP
Agents142 XP
Automation94 XP

+ 7 more tracks

Achievements · 12 / 22

Cache WizardPlan MasterPolyglotNight OwlFirst SessionWeek One??? × 10

Share-friendly. Public. Proof of craftsmanship — not of typing volume.

100 levels · 10 tiers

From Karpathy to Turing

1-10KarpathyNeural nets from scratch
11-20AltmanProductized the LLM era
21-30HassabisDeepMind, AlphaFold
31-40LeCunCNNs, Meta FAIR
41-50HintonGodfather of deep learning
51-60TorvaldsLinux, Git
61-70KnuthThe Art of Computer Programming
71-80HopperFirst compiler, coined "bug"
81-90LovelaceFirst published algorithm, 1843
91-100TuringTuring machine, the foundation

11 skill tracks · each 1-100

Every track levels on its own.

Daily XP splits across tracks based on the tools you actually used. Each track has its own 1-100 curve (flatter than character levels — specializing is encouraged). Here's how to grind each one:

File CraftEdit / Write / MultiEdit — surgical diffs
ResearchGrep / Glob / Read / Web* — find the signal
AgentsTask / Agent / SubAgent — delegate work
AutomationBash / Shell — scripts that do the loop
Project MgmtTodoWrite / TaskCreate — track the plan
AnalysisPlan Mode / ExitPlanMode — think first
SynthesisSession spans ≥3 tool categories
WritingPrompts with ≥200 char avg — drafting prose
ConversationTool-less prompts — rubber-ducking
TeachingPublish a craftable peers adopt
CreativityFirst-time (tool × modality) combo

Level curves

Respectful of your time.

Character level is steep: 10 × n^1.7 XP — reaching Turing (Lv 100) takes years of sustained craft. Skills are flatter: 3 × n^1.4 — a specialist can reach Skill Lv 60 in ~3 months. The tier breadcrumb stays meaningful across the whole journey.

Crafting (Phase 2)

Ship artifacts your team actually uses.

Authoring a Skill, Rule, MCP, Prompt, Workflow, or Agent earns XP. Rarity climbs as peers adopt it: common → uncommon → rare → epic → legendary. Legendary = real evidence you shipped something the team rallied around.

Viral by design

Every surface is a share surface.

Scoreflow doesn’t buy ads. It ships in six share loops — README badges, OG cards, quizzes, PR comments, monthly Wrapped, public team pages. Each is live. Click any to see.

V.1 · Badge
scoreflowLv 2 · Karpathy![](scoreflow.dev/u/.../badge.svg)

Live README badge

Paste one line in your GitHub profile or project README. Updates within 5 minutes of any XP change. Wakatime-style — ~1M badges in the wild.

Copy my badge →
V.2 · Profile card

Pixel-art character sheet

Every /u/<handle> unfurls with a JRPG-style status screen — level, tier, XP bar, skill tracks, achievements. Designed for the X share moment.

See a live profile →
V.4 · Quiz
07 / 10 · CRAFTSMANSHIP
Before editing a file you have not opened today…
A · Ask Claude to just make the change
B · Ask Claude to Read it first
C · Read it yourself

No-signup quiz

Ten questions. Returns a CCS estimate and an archetype (Vibecoder → Master). Built to seed HN and Reddit threads — "what did you get?" as a conversation hook.

Take the quiz →
V.6 · GitHub Action
github-actions commented · 8 min ago
🟢 @diego is on Lv 12 Altman · 287 XP · streak active
Top skills: File Craft Lv 3 · Research Lv 2
See your own · Manifesto · Quiz

PR comment on every push

Install the Action in any repo. On every opened PR it posts the author's CCS as a comment. Every reviewer becomes an impression — the CodeRabbit loop.

Install the Action →
V.7 · Wrapped

Monthly wrapped drop

On the 1st of every month you get a personal recap page — XP earned, levels crossed, unlocks. Spotify Wrapped pattern but monthly so it never goes silent.

See a sample wrapped →
V.8 · Team page
Team · General
Members12
Avg level4.2
BandTOP 25%

Team leaderboard with k=5

Every team gets a public page. Aggregates unlock at 5 opted-in members — never individual ranks, always bands. Strava-club pride without the shame.

Open the General team →
See every shipped piece in the changelog. Read why we don’t rank CCS linearly in the manifesto.

Privacy by default

Built for engineering leaders who have to defend the tool in the all-hands.

Privacy is a design constraint, not a feature toggle. These four pillars are non-negotiable and shipped from day one.

No prompt content stored

We record metadata — length, token counts, acceptance, tool name. Never the prompt text or the output. Optional premium tier adds customer-side encryption for teams that want richer signals.

Team aggregates by default

Every manager-facing view is team-level. Individual rankings require explicit opt-in from the user being shown. Enterprise mode enforces team-only with a configurable k-anonymity floor (default k=5).

EU data residency

All stored events + scores live on Hetzner Frankfurt. Identity (WorkOS), payments (Lemon Squeezy), and email (Resend EU) run on edge providers — never the hot path of ingestion or storage.

Works-council mode

Enterprise contracts include a works-council profile: redaction below k=5, DPIA templates, audit logs on every individual-level view, delete-on-offboarding with retention policies per regulation.

Pricing

Free for everyone during MVP.

We're proving the value prop first. Team Pro + Enterprise are on the roadmap — pricing will be feature-based (connectors, SSO, retention), never per-dev or per-event. Signing up now locks your founding-user status.

All yours

Everyone

$0

while we find the right price

Full scoring, gamification, and team features. Use it freely while we earn your trust.

  • Unlimited seats per team
  • All scoring + gamification (CCS + XP + levels + achievements)
  • Claude Code CLI + MCP + weekly digest
  • Global leaderboard (opt-in, default yes for community teams)
  • Public profile on scoreflow.dev/u/<handle>
  • Anthropic Analytics connector (Claude.ai web/desktop/mobile)
Start free
Coming soon

Team Pro

Coming soon

Premium features for teams ready to invest in AI craftsmanship — pricing under review.

  • SSO + SCIM
  • Extended retention (12mo+)
  • Custom branding on /u/<handle>
  • Team Pro Badge
  • Priority support
  • Early access to new scoring components
Notify me

Enterprise

Custom

annual, invoice

For 50+ devs with works-council-friendly visibility, SLA, and DPA.

  • Works-council mode with k-anonymity floor
  • DPIA templates + Data Processing Agreement
  • Self-hosted option (Hetzner or your infra)
  • Dedicated onboarding
  • SLA + audit trail export
Talk to us

FAQ

Questions engineering leaders actually ask.

If there's something not covered here, email diego@infinitelabs.co — every question so far has shaped the product.

Won't this just Goodhart itself? Devs will game whatever you score.

Every component is banded, not monotone — spamming accepts, rubber-stamping, token-bursting, or artificial breadth all collapse the score. The anti-gaming flags are published in packages/scoring/src/anti-gaming.ts so the mechanism is auditable.

Do you store our prompts or generated code?

No. We store metadata only: tool name, length, accepted/rejected, timestamps, modality. Never prompt or output text. An opt-in premium tier can ingest richer signals under a customer-side encryption model — same default stays off.

How is individual vs. team visibility handled?

Community teams default to opt-in-with-default-yes: users see their own level publicly and participate in the global leaderboard unless they flip it off. Enterprise mode defaults to team-aggregate-only with a k-anonymity floor (k=5 default) — any view below that is redacted automatically.

What do you need to get started?

A GitHub account and permission to install the ScoreFlow GitHub App on the repos you want scored. For tools like Claude Code / Cursor / Copilot we can attribute via OTEL traces or admin API — whichever your stack supports.

Is this really not a subscription?

Team Pro is a one-time $149 lifetime license up to 50 seats. Enterprise is annual with a custom quote because of the SSO/SCIM/works-council work. Community stays free forever for teams of 5.

Where does the data live?

Hetzner Frankfurt (EU). Identity via WorkOS, payments via Lemon Squeezy (MoR, handles EU VAT), email via Resend EU — those are edge services, never hot-path ingestion or storage.

Stop counting lines. Start measuring craft.

Create a team in 30 seconds, invite up to five peers, wire a single hook into Claude Code, and get your first score this week.

Start free — no credit cardTalk to the founder