i built a cad editor in the browser, then taught an llm to use it

parsing autocad dwg, reconstructing walls and rooms from anonymous line segments, building a canvas cad editor from scratch, and handing the whole thing to claude as tools it can call

the demo moment i did not plan: i asked my app "how many doors and windows are there?" and the ai answered the counts, then added, unprompted:

note D3 is only 300 mm wide, likely a mis-detected door. want me to check it?

it was right. my extraction pipeline had turned a piece of geometry into a 30 cm door. no human had noticed. the model read the quantity take-off, saw a door narrower than a shoebox, and flagged it.

that little moment is the payoff of a longer story: parsing one of the most hostile file formats in existence (autocad dwg), reconstructing real building models from thousands of anonymous line segments, building a 2d cad editor from scratch in canvas, and then handing the whole thing to claude as a set of tools it can call.

this is how i built it, including the parts that went wrong.

what the thing does

          upload                durable etl job                     viewer + editor
 browser ────────▶ fastapi ───────────────────▶ dwg → dxf → model ────▶ react + three.js
 (react)           + dbos        (libredwg + ezdxf, by layer)           2d cad editor
                                       │                                    ▲
                                       ▼                                    │
                                 entity rows in db  ◀──── ai copilot ───────┘
                                 (walls, doors, rooms)     (claude + tools)

you drop a dwg floor plan in the browser. a durable background job converts it, extracts the geometry, and reconstructs a structured building model: walls as centerlines, doors and windows placed on their host walls, rooms as polygons with names and areas. you get a 3d viewer, a full 2d cad editor (snapping, dimensions, undo, print to scale), quantity take-offs, and export back to dwg.

and then there is a chat panel where an ai agent can read the model and edit it. for real. the edits land in the database, show up live in 2d and 3d, and sit in your undo history like any manual edit.

the important thing to understand before any of that: a dwg file does not contain walls. it contains lines. everything interesting in this project is the distance between those two sentences.

part 1: dwg is a format that hates you

dwg is a closed binary format with about 30 years of versions. there is no official spec. the open source world has exactly one serious answer: libredwg, which can read basically everything and write reliably to one version (r2000).

two decisions saved me weeks here:

run libredwg as a subprocess, not a library. it is gpl, my code is not, and a subprocess boundary keeps the licenses clean. it also means a segfault in a 30-year-old format parser kills a child process, not my server. the converter turns dwg into dxf, which is the same data model as text, and from there ezdxf (a superb python library) takes over.

never trust the file. my favorite example: dwg files carry a $INSUNITS header that declares the drawing units. it lies constantly. i had a file that declared "metres" while every coordinate was clearly millimetres. trust it and your apartment renders a thousand times too big.

the fix is embarrassingly simple: ignore the header, look at the numbers.

def infer_metres_per_unit(spans: list[float], kind: str) -> float:
    # a building floor plan is 5 to 100 metres across. if the raw
    # coordinate span is ~8000, the drawing is in millimetres,
    # whatever the header claims.
    span = max(spans)
    for scale in (1.0, 0.001, 0.01, 0.0254, 0.3048):
        metres = span * scale
        if plausible_extent(metres, kind):
            return scale
    return 1.0

same paranoia everywhere else. real files come structurally broken: truncated sections, invalid handles, entities the strict parser refuses. when ezdxf.recover gives up, i do not: a dxf is at heart a stream of (group code, value) pairs, so i drop to a raw tag scanner that walks the text and pulls out every LINE, LWPOLYLINE, CIRCLE and TEXT it can find, geometry only, no object model. you lose elegance, you keep the floor plan. and a converter that emits latin-1 diagnostics into a utf-8 pipe cannot blow up the job either (capture bytes, decode with errors="replace", a real crash i hit on 7 of my first 35 test files).

to prove the pipeline was solid i collected 231 real dwg files from the internet: architectural, structural, mechanical, electrical, every version from r1.4 to 2025. a smoke test harness runs each file through the full pipeline in an isolated subprocess with a timeout, so one hang cannot take down the run.

result after fixing everything the corpus surfaced: 231 files, 0 crashes. graceful "i could not reconstruct much from this" is allowed. a traceback is not.

part 2: from thousands of line segments to actual walls

extraction gives you a soup. blocks (reusable symbols like door leafs and window frames) get exploded recursively into their primitive entities, arcs get sampled into chords, polylines get split into edges. what comes out is tens of thousands of bare segments, each carrying exactly one piece of metadata: its layer name.

layer names are the only semantics a dwg has, and they are a free-text field filled by humans. german architects write WAND or MAUER, english ones write A-WALL or WALLS, and everyone abbreviates differently. so classification is humble substring matching over a multilingual hint list:

_WALL_PATS = ("wall", "wand", "mauer", "gebäude", "gebaeude")
_DOOR_PATS = ("door", "tür", "tuer", "porte")
_WIN_PATS  = ("window", "fenster", "wind")

the same trick, with a different hint list (flurstück, parcel, gelände, straße...), decides whether the whole drawing is a building floor plan or a site plan, because the two need completely different reconstruction. and when a file has no recognizable wall layer at all (everything dumped on layer 0, a classic), the pipeline falls back to "every segment that is not obviously a door, window, text or dimension" and attaches a warning saying results are approximate. honesty over confidence.

now the real problem. on the wall layer, a wall is never one line. it is drawn as parallel pairs, and each line is broken into fragments wherever a door or window interrupts it, often drawn in opposite directions by different commands years apart:

 what the dwg contains on the wall layer:

   ──────▶      ◀──────────        ──▶        four fragments, mixed directions

 what the model needs:

   ══════════════════════════════════         one wall centerline
         └─ 900mm gap ─┘   └─ 1200mm gap ─┘   with its gaps remembered

merging those fragments is a grouping problem: which segments lie on the same infinite line? i hash each segment by its quantized angle (4 degree buckets, modulo 180 so direction does not matter) plus its perpendicular offset from the origin (50 mm buckets). segments that share a hash are collinear neighbours.

the bug that taught me the most lived right here. the offset is computed along the line's normal vector, and a segment drawn right-to-left has its normal pointing the opposite way from its left-to-right twin, which flips the sign of the offset, which puts the two halves of the same wall into different hash buckets. the fix is one canonicalization:

# flip the normal into one half-plane so segments drawn in opposite
# directions on the SAME line (two halves of a wall split by a door,
# extremely common in real dwgs) hash together
if ny < 0 or (ny == 0 and nx < 0):
    nx, ny = -nx, -ny
offset = round((a[0] * nx + a[1] * ny) / 50.0)

before that fix, half my test corpus reconstructed twice as many walls as the buildings had.

within each group the merge becomes one-dimensional: project every fragment onto the group's direction, sort the intervals, and sweep. touching or overlapping intervals fuse. gaps up to 2600 mm (a generous door width) get bridged and recorded, because a door-sized hole in a wall line is not noise, it is a door. gaps bigger than that split the wall in two. the recorded gaps come out the other side as opening candidates for free.

with centerlines in hand, classification is geometric. a wall whose midpoint hugs the bounding box perimeter is external (230 mm brick by default), everything else is a partition (100 mm block). testing the midpoint rather than the endpoints matters: an interior wall whose ends touch the facade would otherwise get promoted to external.

doors and windows then snap to their hosts: take each segment from a door or window layer, find the nearest wall within a 500 mm perpendicular tolerance, and project the midpoint onto the wall's axis. that projection is the door's position, stored as one number, center_mm along the wall. openings with the same type and size (rounded to 50 mm) share a schedule mark, which is how D1, D2, W1 get assigned exactly the way a human drafter would.

every dimension nobody drew (storey height 3000, door height 2100, window sill 900) is a stated construction default, appended to the model's warnings so the ui can show "assumed" instead of pretending the file said so.

part 3: rooms are faces of a planar graph

rooms are the part that feels like magic and is actually graph theory.

once you have wall centerlines, a floor plan is a planar subdivision: the lines carve the plane into faces, and the bounded faces are the rooms. extracting them is a classic computational geometry exercise:

 1. split every segment at every         2. walk the half-edges, always
    intersection, snap nodes                taking the sharpest clockwise
    to a 60mm grid                          turn at each node

    ┌──────┬────────┐                       ┌──────┬────────┐
    │      │        │                       │  R1  │   R2   │
    ├──────┴───┬────┤          ──▶          ├──────┴───┬────┤
    │          │    │                       │    R3    │ R4 │
    └──────────┴────┘                       └──────────┴────┘

 3. every minimal face gets traced exactly once.
    one face is the unbounded outside: drop it.

step 2 is the elegant bit. from every directed edge, keep turning as sharply clockwise as possible and you trace exactly one minimal face, and every interior face of the plan gets traced exactly once. no recursion, no flood fill, just angles.

step 3 hid my favorite geometry bug. you get all the rooms plus one extra polygon: the outer boundary of the whole building, traced from the outside. my first heuristic dropped any face bigger than some fraction of the bounding box. worked on every rectangular test building, then an l-shaped floor plan arrived. the outer face of an l-shape is nowhere near its bounding box area, so it slipped under the threshold and appeared in the ui as a giant phantom room covering the entire footprint. the correct rule is topological, not metric: in a connected arrangement the unbounded face encloses everything else, so it is always the single largest face. drop the max, keep the rest. no threshold to tune, no shape that breaks it.

naming the survivors is almost an afterthought: floor plans carry text labels ("kitchen", "büro 2.13"), so run point-in-polygon of each label against each room face and the names fall into place. rooms without a label become "room 7", which is at least honest.

it is worth pausing on what happened across these two parts. the input was an unordered pile of anonymous line segments. the output is: 7 walls with types and materials, 6 doors and 3 windows that know which wall they sit on and where, 4 rooms with names, areas and perimeters, and a structural grid derived from the dominant wall axes. nothing in the file said any of that. it was all latent in the geometry.

part 4: the model is rows, not a blob

the easy design would be: etl emits a json file, viewer reads the json file. done.

i did not do that, and it turned out to be the single most important decision in the project. the extraction writes the model into the database as entity rows:

walls        (id, x1, y1, x2, y2, thickness_mm, height_mm, type, material)
openings     (id, mark, type, wall_id, center_mm, width_mm, height_mm, sill_mm)
rooms        (id, name, polygon, floor_finish, wall_finish)
dimensions   (id, x1, y1, x2, y2, offset_mm)
labels       (id, target_kind, target_id, text, tx, ty)

one endpoint assembles the rows into the model json the viewers consume. another validates an edited model against pydantic schemas and writes it back to the rows.

notice what that buys you. editing is just row updates. dwg export is just reading rows and emitting dxf entities (real DIMENSION and TEXT entities, mitered wall outlines). and, foreshadowing, an ai tool that edits the model is just another caller of the same validated save path.

a door is not a hole in a picture. it is a row that knows which wall it lives on and how far along it sits. everything downstream falls out of that.

part 5: a cad editor is mostly three problems

i researched the buy option first. commercial browser cad sdks either cannot actually edit dwg (they are viewers with markup) or start around $7,500 a year for a general-dwg scope i did not need. i only needed to edit my reconstructed model. so i built it on a raw canvas 2d context.

a cad editor that feels professional is mostly three problems: rendering speed, snapping, and undo.

rendering. my first version redrew everything on every mousemove. on a 963-wall plan that was 1154 ms per frame. one frame per second. the fix is the oldest trick in graphics: split the scene by how often it changes.

 layer 3: overlay canvas    cursor, snap markers, drag previews   every mousemove
 ─────────────────────────────────────────────────────────────────────────────────
 layer 2: model cache       walls, rooms, doors, dimensions       on commit
 ─────────────────────────────────────────────────────────────────────────────────
 layer 1: base cache        grid + original dwg linework          on pan/zoom

layers 1 and 2 are offscreen canvases, redrawn only when their content changes. mousemove touches only the overlay. walls get batched into two Path2D objects (one per wall type) so the browser does two fill calls instead of a thousand.

1154 ms became 82 ms under a software rasterizer, and comfortable 60 fps on a real gpu. no webgl, no framework, just not repainting things that did not change.

snapping. autocad muscle memory is real: endpoints beat intersections beat midpoints beat grid. i keep two r-tree spatial indexes (rbush), one for segments and one for points, query a small box around the cursor, and pick the winner by priority:

endpoint(10) > intersection(9) > midpoint(8) > perpendicular(7) > on-line(5) > grid(2)

one detail that separates "toy" from "correct": a strong object snap overrides ortho lock. if you are drawing orthogonally but hover an endpoint that is 2 degrees off axis, real cad snaps to the endpoint. get that wrong and anyone who has used autocad feels it instantly.

undo. i used snapshots, not command objects. before any transaction the store clones the model (structuredClone, about 200 kb for 1000 walls, 80 snapshots deep). drags mutate the live model freely so every side panel updates in real time, and if you press escape the snapshot rolls back.

execute(mutate: (m: BuildingModel) => void) {
  this.begin();          // clone the model
  mutate(this.model);    // do anything
  this.commit();         // snapshot -> undo stack, autosave kicks in
}

boring, memory-hungry, and completely immune to the classic command-pattern bug where one action forgets to implement its inverse. for models of this size it is the right trade.

saving is a debounced single-flight put of the whole model, with an abort controller so a newer save supersedes an in-flight older one. writes cannot land out of order.

part 6: give the model tools, not a text box

then my company announced an ai hackathon, and i had claude opus 4.8 available through an azure ai foundry endpoint (foundry serves the native anthropic api, so the official sdk works as-is, just point base_url at it).

the tempting demo is rag: "chat with your floor plan." i wanted the other thing. an agent that does work: place the door, label the rooms, run the take-off, fix the storey height.

the architecture is a tool-use loop:

 user: "add a 1500mm window to the longest external wall"
   │
   ▼
 claude ── tool: list_elements(kind=walls) ──▶ backend runs it ──▶ json result
   │                                                                  │
   ◀──────────────────────────────────────────────────────────────────┘
   │   "W1 and W3 are both 10m. W1 already has a door and a
   │    window, so W3 it is."
   ▼
 claude ── tool: add_openings([{wallId: "W3", type: "window",
   │                            width_mm: 1500}])
   ▼
 backend: mutate model -> validate -> save rows -> stream result
   │
   ▼
 claude: "done. added window W3 (1500x1200, sill 900) centred on
          wall W3, with a leader label."

thirteen tools total. three read (summary, element listing, quantity take-off), ten write (walls, openings, rooms, annotations, storey, rename, delete with cascade).

three rules made this safe instead of terrifying:

1. every mutation goes through the same validated path the ui uses.

def _save(project_id: str, model: dict) -> None:
    try:
        save_edited_model(project_id, model)   # pydantic-validated, same as PUT /model
    except Exception as exc:
        raise ToolError(f"edit rejected by model validation: {exc}") from exc

the agent physically cannot persist a model the editor and the dwg exporter would not accept. i did not write a second, special, "for the ai" write path. that is the whole trick, and it only works because part 4 made the model rows with a single schema-checked door in front of them.

2. tool errors are recoverable, not fatal. a ToolError ("no wall with id W99", "a 5000 mm door does not fit on a 3200 mm wall") is fed back to the model as an error result. the model reads it, adjusts, and tries something else. the stream never dies because the agent guessed wrong once. watching it recover mid-conversation is half the magic of the demo.

3. domain guardrails live in the tools, not the prompt. default door sizes, auto-assigned marks, fit checks, cascade deletes (kill a wall, its doors and labels go with it). prompts drift. code does not.

the loop itself is about 60 lines: stream a response, forward text deltas to the browser as server-sent events, and when the model stops with tool_use, execute the tools, append the results, and continue, up to 12 rounds.

the part nobody talks about: ai edits and your undo stack

here is the ux problem that makes or breaks a copilot in an editor. the ai edits the database server-side, but the user is holding a live client-side editing session with its own undo history. if those drift apart you get the worst bug class in collaborative software.

my solution has two halves.

flush before the agent reads. the editor autosaves on a 900 ms debounce, so when you send a chat message there may be an unsaved wall on your screen. before every agent turn the frontend cancels the pending timer and force-saves. the agent always sees your latest state, and, just as important, a stale queued autosave can no longer fire mid-run and silently overwrite what the agent wrote.

fold the result into undo as one step. when the agent reports that it changed something, the frontend refetches the model and applies it through the normal store transaction:

const applyServerModel = async () => {
  const res = await api.getResult(project.id);
  const fresh = structuredClone(res.model);
  store.execute((m) => {
    Object.assign(m, fresh);   // one commit -> one undo entry
  });
};

whatever the agent did, five tool calls, twenty labels, becomes exactly one entry in the undo stack. ctrl+z reverts the entire ai edit, and the autosave writes the revert back to the database. i have an end-to-end browser test that asserts the full round trip: agent adds a door, db count goes up by one, ctrl+z, db count comes back down.

an ai copilot without undo is a liability. with undo it is just a very fast colleague.

two bugs worth telling

the invisible stream. my first browser test passed the "request completed" check but showed an empty chat bubble. the backend streamed sse frames, curl showed them fine, the browser parsed nothing. cause: my parser split frames on \n\n, but the sse library (sse-starlette) terminates lines with \r\n. curl's terminal output hides the difference completely. the fix is one regex:

// frames end with a blank line; the server uses \r\n line endings
while ((m = buf.match(/\r?\n\r?\n/))) {
  const frame = buf.slice(0, m.index);
  buf = buf.slice(m.index + m[0].length);
  // parse "event:" and "data:" lines with split(/\r?\n/)
}

if you ever hand-parse sse from a fetch body: it is always the line endings.

the minus one millimetre door. i wrote a fuzz harness that reconstructs models from the whole dwg corpus and fires every tool at them with valid, edge-case, and garbage inputs, asserting after every single mutation that the model still validates and still exports to dxf. it found that add_openings with width_mm: -1 sailed straight through my fit check:

if width + 100 > length:   # -1 + 100 = 99, "fits" on any wall
    raise ToolError(...)

a door with negative width, persisted to the database. would claude ever send that? probably not, the tool schema declares minimums. but "probably not" is not an invariant, and the boundary has to hold on its own. range checks went in, and the fuzz harness lives in the repo and runs against the corpus like the etl smoke test does.

fuzz your tool layer. the llm is a fuzzer with excellent grammar, so meet it with one that has none.

what i learned

the hard part was never the ai. it was turning anonymous line segments into walls, doors and rooms. i spent 90% of the time on extraction, geometry and the editor, and the ai feature took days, not weeks, precisely because that foundation existed.
never trust file headers. measure the geometry. the units, the layers, the structure: verify everything against what is actually drawn.
geometry bugs hide in the shapes you did not test. rectangular buildings passed for weeks; one l-shaped plan broke the room detection. topological rules (drop the largest face) beat metric thresholds (drop faces over x%) every time.
one validated write path. the ui, the import pipeline, and the ai all write through the same schema-checked door. every safety property you enforce there is enforced everywhere, forever.
let the agent fail cheaply. recoverable tool errors turn "the ai crashed" into "the ai corrected itself", which users read as intelligence.
the ai needs an undo story. refetch and fold into one transaction is simple and it works. ship nothing agentic without it.
smoke test against reality. 231 hostile files taught me more than any spec. the corpus, not my imagination, decided what "robust" means.

the stack, for the curious: fastapi, dbos durable workflows, libredwg + ezdxf, sqlmodel on sqlite (postgres-ready), react + vite + typescript, three.js for 3d, raw canvas 2d for the editor, rbush for spatial indexing, and the anthropic python sdk pointed at claude opus 4.8 on azure ai foundry.

the whole thing runs on one small box. the most satisfying part is still that first conversation: asking a floor plan a question, getting a correct answer, and watching a door it drew for you appear in the drawing, with its swing arc, its mark tag, and a ctrl+z that takes it away again.