The Context Tax - Arif Dogan

Four metrics that actually reflect flow and context friction: TTCAA, Flow Session Length, Reorientation Time, and Context Provision Ratio.

Architecture and principles are necessary, but they're not sufficient.

If you're introducing AI coding support to a team, you need to answer: Is this actually helping, or just creating noise?

To answer that, you need metrics that reflect flow, not just usage.

Why Traditional Metrics Fail

Lines of code generated — meaningless. More code isn't better code.

Daily active users — tells you who clicked something, not whether it improved their work.

Suggestion acceptance rate — can be high even if the tool destroys flow (accepts a suggestion, then immediately reverts it).

Token throughput — measures AI speed, not human productivity.

None of these measure the context tax.

Four Metrics That Actually Matter

These metrics directly reflect flow and context friction:

1. Time To Commit After AI (TTCAA)

Definition: Time between first AI interaction for a task and the next commit/PR related to that task.

What it shows:

When AI interactions produce useful changes quickly, TTCAA is low
When AI interactions lead to long ping-pong cycles, TTCAA is high

How to measure:

Log each AI invocation with timestamp
Track commits with timestamps
Calculate: commit_time - first_ai_invocation_time for the same task
Average over many tasks

Typical values:

Excellent: <10 minutes
Good: 10-20 minutes
Mediocre: 20-40 minutes
Poor: >40 minutes (suggests lots of back-and-forth)

Example:

Sarah's TTCAA: 70 minutes (many fruitless AI interactions before finally getting help from Miguel)
Miguel's TTCAA: 15 minutes (one AI interaction → quick fix → commit)

2. Flow Session Length

Definition: Average uninterrupted period a developer spends in focused work inside their main tools before a context-breaking action.

What it shows:

Longer sessions = deeper focus
Frequent breaks = lots of task switching and context reloading

What counts as "breaking":

Alt-tabbing to browser/chat
Switching to email/Slack
Long idle periods (>2 minutes)

What doesn't count:

Switching between editor and terminal
Running tests
Reading other files in the project

How to measure:

IDE plugins can track focus time
Window management tools can log active window
Calculate streaks of continuous focus

Typical values:

Excellent: 25-45 minutes (before natural break)
Good: 15-25 minutes
Mediocre: 10-15 minutes
Poor: <10 minutes (constant interruption)

3. Reorientation Time

Definition: Time from returning to the editor after an AI interaction to making the next meaningful edit.

What it shows:

Low reorientation = tool kept you in context
High reorientation = you're rebuilding your mental map

What counts as "meaningful edit":

Adding/changing code (not just formatting)
Running a test
Making a commit

What you're measuring:

The scroll-and-remember time
"Wait, what was I doing?"
Re-reading surrounding code to rebuild understanding

How to measure:

Log timestamp when AI response completes
Log timestamp of next actual edit
Calculate: edit_time - ai_response_time

Typical values:

Excellent: <30 seconds
Good: 30-90 seconds
Mediocre: 90-180 seconds
Poor: >3 minutes

Sarah vs Miguel:

Sarah's average reorientation: ~5 minutes per interaction
Miguel's average reorientation: ~20 seconds

This single metric captures the essence of context tax.

4. Context Provision Ratio

Definition: Ratio of context automatically gathered by the tool to context manually provided by the developer.

What it shows:

High ratio = tool does the work
Low ratio = developer is the serialization layer

How to measure:

For each AI interaction, count:

Auto context: lines of code/config/logs/errors the tool gathered
Manual context: lines the developer copied/pasted or typed in explanation

Context Provision Ratio = Auto Context / (Auto Context + Manual Context)

Typical values:

Excellent: >0.9 (tool gathers 90%+ of context)
Good: 0.7-0.9
Mediocre: 0.4-0.7
Poor: <0.4 (developer doing most of the work)

Example:

Sarah's tool: ~0.1 (she manually provided almost everything)
Miguel's tool: ~0.95 (tool auto-gathered test, error, logs, config, diff)

Implementing Metrics: Practical Guide

For Engineering Leaders:

Start simple. Pick one metric based on what you can instrument:

Easiest to start: Reorientation Time

Requires: IDE plugin or tool with logging
Effort: Low
Value: High (directly measures context tax)

Medium difficulty: TTCAA

Requires: AI tool logging + git commit tracking
Effort: Medium
Value: High (measures end-to-end effectiveness)

Measure for 2 weeks:

Baseline without AI
2 weeks with AI tool
Compare

Look for:

Is TTCAA better than no-AI baseline?
Is reorientation time reasonable?
Do developers feel less fragmented?

For Tool Builders:

Instrument everything from day one:

// Example telemetry schema
{
  event: "ai_invocation",
  timestamp: "2024-01-15T10:23:00Z",
  developer_id: "hashed_id",
  context_auto: {
    files: 3,
    lines_code: 120,
    lines_logs: 45,
    config_files: 2
  },
  context_manual: {
    lines_typed: 8  // developer's question
  },
  response_time_ms: 4200,
  applied: true,
  edit_after_response_seconds: 18
}

Track:

When AI is invoked
What context was gathered (auto vs manual)
How long until response
Whether response was applied
Time to next edit
Time to next commit

Dashboard views:

TTCAA percentile distribution
Reorientation time trend over time
Context provision ratio by project
Flow session length correlation with AI usage

What Good Looks Like

After implementing context-aware AI, you should see:

TTCAA decreases by 40-60% compared to chat-first tools
Flow sessions lengthen by 30-50%
Reorientation time drops below 1 minute on average
Context provision ratio >0.85

If you're not seeing these improvements, the tool isn't respecting flow—no matter how good the model is.

Measuring What Matters

Why Traditional Metrics Fail

Four Metrics That Actually Matter

1. Time To Commit After AI (TTCAA)

2. Flow Session Length

3. Reorientation Time

4. Context Provision Ratio

Implementing Metrics: Practical Guide

What Good Looks Like