MVP now in closed beta · Peninsula Innovation design partner

AI craftsmanship, measured.

Teams-first scoring that rewards how well your people use AI — not how much. Built on OpenTelemetry + Git + admin APIs. No prompt content stored. Team aggregates by default.

Create your team — free up to 5See how it works →

5 devs free · $149 Team Pro Lifetime ≤50 · Enterprise for 50+

scoreflow.dev/teams/peninsula
┌─ Peninsula · this week ─────────────────────────────────────┐
│                                                              │
│   Team CCS           78 / 100       ▲ 6 pts                  │
│   ────────────────────────────────────────                   │
│   Effectiveness      82     ▮▮▮▮▮▮▮▮▯▯                       │
│   Craftsmanship      74     ▮▮▮▮▮▮▮▯▯▯                       │
│   Outcome            71     ▮▮▮▮▮▮▮▯▯▯                       │
│   Breadth            85     ▮▮▮▮▮▮▮▮▮▯                       │
│                                                              │
│   🎖  Team unlocked: Cache Wizard (Silver)                   │
│   🏆  Top skill this week: File Craft  +184 XP               │
│                                                              │
└──────────────────────────────────────────────────────────────┘

How it works

Three moving parts. Nothing you can game.

Measurement that engineers respect because it's rooted in their real tooling, not a survey.

01

Ingest the signal, not the content

OpenTelemetry from Claude Code / Copilot / Cursor, GitHub webhooks for PR outcomes, admin APIs for per-user attribution. Event metadata only — never prompt or output text.

02

Score with bands, not monotones

Four components — Effectiveness, Craftsmanship, Outcome, Breadth — each using sweet-spot bands so gaming the metric collapses the score. Weekly snapshot per team + rolling 28-day windows.

03

Deliver where devs live

MCP server so devs see their score inside Claude Code. Weekly email digest for managers. Web dashboard for deep dives. Live CLI hooks that show XP + level-ups in the terminal.

The Craftsmanship Score

Four components. Each one rewards judgment.

Weights are published. Bands are documented. If a signal can be gamed, the band punishes the gaming so the score doesn't.

Effectiveness

w 0.35

Do the completions land?

Tool acceptance rate, immediate-reject signal, regeneration ratio — banded so spamming "accept" collapses.

tool_accept_rateimmediate_rejectregeneration_ratiomodality

Craftsmanship

w 0.25

Is the process thoughtful?

Plan-mode ratio, cache hits, rules-file freshness, distinct skills — rewards deliberate AI use over spray-and-pray.

plan_mode_ratiocache_hit_ratiorules_freshnessskills_used

Outcome

w 0.25

Does the work ship and stick?

PR merge rate, CI pass-rate-first-push, 30-day durability, change-failure proxy. Real outcomes, not vanity volume.

pr_merge_rateci_first_passdurability_30dchange_failure

Breadth

w 0.15

Is AI adoption broad?

Distinct AI tools, modalities (autocomplete + chat + agent), repos touched, surfaces used — breadth without the gaming surface of volume.

distinct_toolsdistinct_modalitiesrepos_with_aisurfaces

Gamification

Levels, skill tracks, achievements. Respectful. Earned.

XP is derived from CCS, so it inherits the anti-gaming bands. Nothing cringe — just legible progress that rewards the right behaviors.

100 levels · 10 tiers

From Karpathy to Turing

1-10KarpathyNeural nets from scratch
11-20AltmanProductized the LLM era
21-30HassabisDeepMind, AlphaFold
31-40LeCunCNNs, Meta FAIR
41-50HintonGodfather of deep learning
51-60TorvaldsLinux, Git
61-70KnuthThe Art of Computer Programming
71-80HopperFirst compiler, coined "bug"
81-90LovelaceFirst published algorithm, 1843
91-100TuringTuring machine, the foundation

11 skill tracks

Not just coders.

PMs, writers, designers progress through the tracks that match how they actually work with AI — not only File Craft and Automation.

AgentsFile CraftResearchAutomationProject ManagementTeachingWritingAnalysisSynthesisConversationCreativity

Crafting

Ship artifacts your team actually uses.

Authoring a Skill, Rule, MCP, Prompt, Workflow, or Agent earns XP. Rarity climbs as peers adopt it: common → uncommon → rare → epic → legendary. Legendary = real evidence you shipped something the team rallied around.

Privacy by default

Built for engineering leaders who have to defend the tool in the all-hands.

Privacy is a design constraint, not a feature toggle. These four pillars are non-negotiable and shipped from day one.

No prompt content stored

We record metadata — length, token counts, acceptance, tool name. Never the prompt text or the output. Optional premium tier adds customer-side encryption for teams that want richer signals.

Team aggregates by default

Every manager-facing view is team-level. Individual rankings require explicit opt-in from the user being shown. Enterprise mode enforces team-only with a configurable k-anonymity floor (default k=5).

EU data residency

All stored events + scores live on Hetzner Frankfurt. Identity (WorkOS), payments (Lemon Squeezy), and email (Resend EU) run on edge providers — never the hot path of ingestion or storage.

Works-council mode

Enterprise contracts include a works-council profile: redaction below k=5, DPIA templates, audit logs on every individual-level view, delete-on-offboarding with retention policies per regulation.

Pricing

Free for 5. Lifetime for 50. Custom beyond.

Non-SaaS pricing because the core value doesn't change week to week. If we raise it, you keep yours. If we get acquired, you still own your license.

Community

$0

forever

For small crews dogfooding ScoreFlow together.

  • Up to 5 seats
  • All scoring + gamification
  • MCP + CLI + weekly email digest
  • Global leaderboard (opt-in, default yes)
  • Team public page on scoreflow.dev/t/<slug>
Start free
Recommended

Team Pro

$149

one-time license

For teams 6–50. Not a subscription. Pay once, own it.

  • Up to 50 seats
  • Everything in Community
  • Anthropic / Cursor / Copilot admin API integrations
  • Private team page + custom domain
  • Priority support
  • Works with your existing SSO (Google/GitHub)
Buy lifetime license

Enterprise

Custom

annual

For 50+ seats under a works-council-friendly policy.

  • WorkOS SSO + SCIM provisioning
  • Team-aggregate-only mode with k-anonymity floor
  • Works-council profile + DPIA templates
  • Self-hosted option (Hetzner or your infra)
  • Dedicated onboarding + SLA
Talk to us

FAQ

Questions engineering leaders actually ask.

If there's something not covered here, email diego@infinitelabs.co — every question so far has shaped the product.

Won't this just Goodhart itself? Devs will game whatever you score.

Every component is banded, not monotone — spamming accepts, rubber-stamping, token-bursting, or artificial breadth all collapse the score. The anti-gaming flags are published in packages/scoring/src/anti-gaming.ts so the mechanism is auditable.

Do you store our prompts or generated code?

No. We store metadata only: tool name, length, accepted/rejected, timestamps, modality. Never prompt or output text. An opt-in premium tier can ingest richer signals under a customer-side encryption model — same default stays off.

How is individual vs. team visibility handled?

Community teams default to opt-in-with-default-yes: users see their own level publicly and participate in the global leaderboard unless they flip it off. Enterprise mode defaults to team-aggregate-only with a k-anonymity floor (k=5 default) — any view below that is redacted automatically.

What do you need to get started?

A GitHub account and permission to install the ScoreFlow GitHub App on the repos you want scored. For tools like Claude Code / Cursor / Copilot we can attribute via OTEL traces or admin API — whichever your stack supports.

Is this really not a subscription?

Team Pro is a one-time $149 lifetime license up to 50 seats. Enterprise is annual with a custom quote because of the SSO/SCIM/works-council work. Community stays free forever for teams of 5.

Where does the data live?

Hetzner Frankfurt (EU). Identity via WorkOS, payments via Lemon Squeezy (MoR, handles EU VAT), email via Resend EU — those are edge services, never hot-path ingestion or storage.

Stop counting lines. Start measuring craft.

Create a team in 30 seconds, invite up to five peers, wire a single hook into Claude Code, and get your first score this week.

Start free — no credit cardTalk to the founder