Architecture and principles are necessary, but they're not sufficient.
If you're introducing AI coding support to a team, you need to answer: Is this actually helping, or just creating noise?
To answer that, you need metrics that reflect flow, not just usage.
Why Traditional Metrics Fail
Lines of code generated — meaningless. More code isn't better code.
Daily active users — tells you who clicked something, not whether it improved their work.
Suggestion acceptance rate — can be high even if the tool destroys flow (accepts a suggestion, then immediately reverts it).
Token throughput — measures AI speed, not human productivity.
None of these measure the context tax.
Four Metrics That Actually Matter
These metrics directly reflect flow and context friction:
1. Time To Commit After AI (TTCAA)
Definition: Time between first AI interaction for a task and the next commit/PR related to that task.
What it shows:
- When AI interactions produce useful changes quickly, TTCAA is low
- When AI interactions lead to long ping-pong cycles, TTCAA is high
How to measure:
- Log each AI invocation with timestamp
- Track commits with timestamps
- Calculate:
commit_time - first_ai_invocation_timefor the same task - Average over many tasks
Typical values:
- Excellent: <10 minutes
- Good: 10-20 minutes
- Mediocre: 20-40 minutes
- Poor: >40 minutes (suggests lots of back-and-forth)
Example:
- Sarah's TTCAA: 70 minutes (many fruitless AI interactions before finally getting help from Miguel)
- Miguel's TTCAA: 15 minutes (one AI interaction → quick fix → commit)
2. Flow Session Length
Definition: Average uninterrupted period a developer spends in focused work inside their main tools before a context-breaking action.
What it shows:
- Longer sessions = deeper focus
- Frequent breaks = lots of task switching and context reloading
What counts as "breaking":
- Alt-tabbing to browser/chat
- Switching to email/Slack
- Long idle periods (>2 minutes)
What doesn't count:
- Switching between editor and terminal
- Running tests
- Reading other files in the project
How to measure:
- IDE plugins can track focus time
- Window management tools can log active window
- Calculate streaks of continuous focus
Typical values:
- Excellent: 25-45 minutes (before natural break)
- Good: 15-25 minutes
- Mediocre: 10-15 minutes
- Poor: <10 minutes (constant interruption)
3. Reorientation Time
Definition: Time from returning to the editor after an AI interaction to making the next meaningful edit.
What it shows:
- Low reorientation = tool kept you in context
- High reorientation = you're rebuilding your mental map
What counts as "meaningful edit":
- Adding/changing code (not just formatting)
- Running a test
- Making a commit
What you're measuring:
- The scroll-and-remember time
- "Wait, what was I doing?"
- Re-reading surrounding code to rebuild understanding
How to measure:
- Log timestamp when AI response completes
- Log timestamp of next actual edit
- Calculate:
edit_time - ai_response_time
Typical values:
- Excellent: <30 seconds
- Good: 30-90 seconds
- Mediocre: 90-180 seconds
- Poor: >3 minutes
Sarah vs Miguel:
- Sarah's average reorientation: ~5 minutes per interaction
- Miguel's average reorientation: ~20 seconds
This single metric captures the essence of context tax.
4. Context Provision Ratio
Definition: Ratio of context automatically gathered by the tool to context manually provided by the developer.
What it shows:
- High ratio = tool does the work
- Low ratio = developer is the serialization layer
How to measure:
For each AI interaction, count:
- Auto context: lines of code/config/logs/errors the tool gathered
- Manual context: lines the developer copied/pasted or typed in explanation
Context Provision Ratio = Auto Context / (Auto Context + Manual Context)
Typical values:
- Excellent: >0.9 (tool gathers 90%+ of context)
- Good: 0.7-0.9
- Mediocre: 0.4-0.7
- Poor: <0.4 (developer doing most of the work)
Example:
- Sarah's tool: ~0.1 (she manually provided almost everything)
- Miguel's tool: ~0.95 (tool auto-gathered test, error, logs, config, diff)
Implementing Metrics: Practical Guide
For Engineering Leaders:
Start simple. Pick one metric based on what you can instrument:
Easiest to start: Reorientation Time
- Requires: IDE plugin or tool with logging
- Effort: Low
- Value: High (directly measures context tax)
Medium difficulty: TTCAA
- Requires: AI tool logging + git commit tracking
- Effort: Medium
- Value: High (measures end-to-end effectiveness)
Measure for 2 weeks:
- Baseline without AI
- 2 weeks with AI tool
- Compare
Look for:
- Is TTCAA better than no-AI baseline?
- Is reorientation time reasonable?
- Do developers feel less fragmented?
For Tool Builders:
Instrument everything from day one:
// Example telemetry schema
{
event: "ai_invocation",
timestamp: "2024-01-15T10:23:00Z",
developer_id: "hashed_id",
context_auto: {
files: 3,
lines_code: 120,
lines_logs: 45,
config_files: 2
},
context_manual: {
lines_typed: 8 // developer's question
},
response_time_ms: 4200,
applied: true,
edit_after_response_seconds: 18
}
Track:
- When AI is invoked
- What context was gathered (auto vs manual)
- How long until response
- Whether response was applied
- Time to next edit
- Time to next commit
Dashboard views:
- TTCAA percentile distribution
- Reorientation time trend over time
- Context provision ratio by project
- Flow session length correlation with AI usage
What Good Looks Like
After implementing context-aware AI, you should see:
- TTCAA decreases by 40-60% compared to chat-first tools
- Flow sessions lengthen by 30-50%
- Reorientation time drops below 1 minute on average
- Context provision ratio >0.85
If you're not seeing these improvements, the tool isn't respecting flow—no matter how good the model is.