How it works
Three moving parts. Nothing you can game.
Measurement that engineers respect because it's rooted in their real tooling, not a survey.
01
Ingest the signal, not the content
OpenTelemetry from Claude Code / Copilot / Cursor, GitHub webhooks for PR outcomes, admin APIs for per-user attribution. Event metadata only — never prompt or output text.
02
Score with bands, not monotones
Four components — Effectiveness, Craftsmanship, Outcome, Breadth — each using sweet-spot bands so gaming the metric collapses the score. Weekly snapshot per team + rolling 28-day windows.
03
Deliver where devs live
MCP server so devs see their score inside Claude Code. Weekly email digest for managers. Web dashboard for deep dives. Live CLI hooks that show XP + level-ups in the terminal.
The Craftsmanship Score
Four components. Each one rewards judgment.
Weights are published. Bands are documented. If a signal can be gamed, the band punishes the gaming so the score doesn't.
Gamification
Levels, skill tracks, achievements. Respectful. Earned.
XP is derived from CCS, so it inherits the anti-gaming bands. Nothing cringe — just legible progress that rewards the right behaviors.
100 levels · 10 tiers
From Karpathy to Turing
11 skill tracks
Not just coders.
PMs, writers, designers progress through the tracks that match how they actually work with AI — not only File Craft and Automation.
Crafting
Ship artifacts your team actually uses.
Authoring a Skill, Rule, MCP, Prompt, Workflow, or Agent earns XP. Rarity climbs as peers adopt it: common → uncommon → rare → epic → legendary. Legendary = real evidence you shipped something the team rallied around.
Privacy by default
Built for engineering leaders who have to defend the tool in the all-hands.
Privacy is a design constraint, not a feature toggle. These four pillars are non-negotiable and shipped from day one.
No prompt content stored
We record metadata — length, token counts, acceptance, tool name. Never the prompt text or the output. Optional premium tier adds customer-side encryption for teams that want richer signals.
Team aggregates by default
Every manager-facing view is team-level. Individual rankings require explicit opt-in from the user being shown. Enterprise mode enforces team-only with a configurable k-anonymity floor (default k=5).
EU data residency
All stored events + scores live on Hetzner Frankfurt. Identity (WorkOS), payments (Lemon Squeezy), and email (Resend EU) run on edge providers — never the hot path of ingestion or storage.
Works-council mode
Enterprise contracts include a works-council profile: redaction below k=5, DPIA templates, audit logs on every individual-level view, delete-on-offboarding with retention policies per regulation.
Pricing
Free for 5. Lifetime for 50. Custom beyond.
Non-SaaS pricing because the core value doesn't change week to week. If we raise it, you keep yours. If we get acquired, you still own your license.
Community
$0
forever
For small crews dogfooding ScoreFlow together.
- ✓Up to 5 seats
- ✓All scoring + gamification
- ✓MCP + CLI + weekly email digest
- ✓Global leaderboard (opt-in, default yes)
- ✓Team public page on scoreflow.dev/t/<slug>
Team Pro
$149
one-time license
For teams 6–50. Not a subscription. Pay once, own it.
- ✓Up to 50 seats
- ✓Everything in Community
- ✓Anthropic / Cursor / Copilot admin API integrations
- ✓Private team page + custom domain
- ✓Priority support
- ✓Works with your existing SSO (Google/GitHub)
Enterprise
Custom
annual
For 50+ seats under a works-council-friendly policy.
- ✓WorkOS SSO + SCIM provisioning
- ✓Team-aggregate-only mode with k-anonymity floor
- ✓Works-council profile + DPIA templates
- ✓Self-hosted option (Hetzner or your infra)
- ✓Dedicated onboarding + SLA
FAQ
Questions engineering leaders actually ask.
If there's something not covered here, email diego@infinitelabs.co — every question so far has shaped the product.
Won't this just Goodhart itself? Devs will game whatever you score.
Every component is banded, not monotone — spamming accepts, rubber-stamping, token-bursting, or artificial breadth all collapse the score. The anti-gaming flags are published in packages/scoring/src/anti-gaming.ts so the mechanism is auditable.
Do you store our prompts or generated code?
No. We store metadata only: tool name, length, accepted/rejected, timestamps, modality. Never prompt or output text. An opt-in premium tier can ingest richer signals under a customer-side encryption model — same default stays off.
How is individual vs. team visibility handled?
Community teams default to opt-in-with-default-yes: users see their own level publicly and participate in the global leaderboard unless they flip it off. Enterprise mode defaults to team-aggregate-only with a k-anonymity floor (k=5 default) — any view below that is redacted automatically.
What do you need to get started?
A GitHub account and permission to install the ScoreFlow GitHub App on the repos you want scored. For tools like Claude Code / Cursor / Copilot we can attribute via OTEL traces or admin API — whichever your stack supports.
Is this really not a subscription?
Team Pro is a one-time $149 lifetime license up to 50 seats. Enterprise is annual with a custom quote because of the SSO/SCIM/works-council work. Community stays free forever for teams of 5.
Where does the data live?
Hetzner Frankfurt (EU). Identity via WorkOS, payments via Lemon Squeezy (MoR, handles EU VAT), email via Resend EU — those are edge services, never hot-path ingestion or storage.