Cursor regression loop fix — 11 bugs/week to 1 and shipped to 500 clinicians
Cursor regression loop fix for Charthealth, a four-person healthtech team building chart-review tooling for outpatient clinicians. Every Cursor prompt broke something else, architectural memory dropped past file seven, 312 Vitest tests were green 40% of the time by coincidence, and PHI was logging to Sentry unredacted. The Break-the-Loop Refactor codified the architecture in ESLint + TypeScript rules, rewrote 88 deterministic tests, fixed the PHI handling with a Sentry beforeSend hook, and moved audit logging to a Postgres trigger. User-reported bugs: 11/week → 1/week. Shipped to 500 clinicians across nine clinics inside the next month.
- 1
- User-reported bugs per week
- 88 tests, 100% green CI
- Test suite size / reliability
- 0
- PHI fields in Sentry events
- 4 of 4 (DB trigger)
- Audit log coverage
About Healthtech (outpatient clinical workflow) client
Charthealth (name changed) is a healthtech (outpatient clinical workflow) team at the pre-revenue, 3 pilot clinics signed → 500 clinicians across 9 clinics stage. They built their product with Cursor and shipped it to pilot users before discovering that the generated scaffolding masked a set of production-grade failures. The engagement that followed was scoped as Break-the-Loop Refactor ($3,999 fixed fee).
Audit findings on day zero
What the first production-readiness pass uncovered before a single line of code was changed. Each finding is a specific Cursorfailure mode we’ve seen repeat across engagements.
- F01
The fix-one-break-another loop
The lead developer described it exactly as the Medium review does: "The filter worked, but the table stopped loading. I asked it to fix the table, and the filter disappeared." Every agent-mode run destabilised something elsewhere. The team was shipping one step forward and one step sideways for six straight weeks. The frustration compounded because each individual prompt felt like progress — Cursor was visibly making changes and the changes typically did fix the requested bug — but the regression rate meant net velocity was effectively zero. Several team members independently considered quitting and reported the issue as the primary reason.
- F02
Architectural memory loss past file seven
Cursor's agent, when given a multi-file task, would forget the conventions established in the first few files by the time it reached the seventh. The auth middleware pattern set in file two was silently re-invented in file nine — with a subtly different permissions check.
- F03
Fragile test coverage (no real signal)
There were 312 Vitest tests. 47% were flaky, 29% asserted on implementation details that changed every prompt, and 11% were disabled outright. CI was green roughly 40% of the time by coincidence.
- F04
HIPAA-adjacent security gaps
PHI fields (patient name, date of birth, diagnosis code) were logged to Sentry without redaction. The audit log table existed but was only written from one of four write paths. A clinician's session token was being stored in localStorage instead of an httpOnly cookie.
- F05
No way to onboard a second engineer
The founder wanted to hire. Three candidates quit the take-home after reading the codebase. The files that were most Cursor-regenerated had the least consistent patterns.
Root cause of the Cursor failure mode
Cursor's agent mode is a powerful local optimiser with no global memory. Given a file, it will make the file better; given a codebase, it will make each file better in a slightly different direction. The causal chain: agent mode rewrites without a style/architecture contract → every rewrite introduces small drift → tests are also agent-written so they drift with the code instead of anchoring it → regressions become invisible until a user reports them → the team prompts Cursor to fix the regression, which drifts something else. Breaking the loop isn't about Cursor — it's about giving Cursor an anchor it can't rewrite: a codified architecture, a test suite that asserts behaviour not implementation, and prompt discipline that scopes changes. The healthtech context made the loop especially expensive — every regression in clinical software is potentially a patient-safety incident, so each bug had to be fully investigated, root-caused, and documented before it could be triaged, even when the actual user impact was cosmetic. The team was spending three days of investigation work per real bug, and Cursor was generating new bugs faster than they could be closed. The break-even arithmetic was no longer working in favour of agent mode.
How we fixed the Cursor rescue stack
Each step below is one remediation workstream from the engagement. In cases where the underlying data includes before/after code vignettes, those render inline; otherwise we describe the change in prose.
- 01
Codified the architecture in a 3-page ARCHITECTURE.md and a set of ESLint + TypeScript rules that fail the build on violations. The conventions are now machine-enforceable, so Cursor either produces compliant code or produces code that won't merge.
- 02
Reorganised the codebase into feature slices (patients/, encounters/, audit/, auth/) with explicit boundaries. Each slice exposes a typed public API; cross-slice imports outside that API are ESLint errors.
- 03
Deleted the 312-test suite and rewrote 88 tests from scratch — Vitest for pure logic, Playwright for the four clinical workflows that must never regress (chart open, note sign, prescription send, audit-export). All 88 are deterministic, all run in CI, all block merge on failure.
- 04
Fixed the PHI logging: added a redaction layer in the Sentry beforeSend hook, scrubbed 6 months of historical events via the Sentry API, moved session tokens to httpOnly secure cookies with SameSite=Lax.
- 05
Made the audit log a Postgres trigger, not an application-layer call. Every insert/update/delete on a PHI table writes to audit_events automatically. Verified by a pgTAP test that attempts writes through all four paths.
- 06
Wrote a 'Cursor playbook' for the team: scoped prompts (one file or one feature slice at a time), required test-first agent runs, and a PR template that fails if the architecture doc is violated. The team now uses Cursor without the regression tax.
- 07
Paired with the lead dev for a week on real tickets to lock the new workflow in. The next two engineering hires passed the take-home on the cleaned repo.
- 08
Wrote a HIPAA-aligned data-handling policy in plain language for the team's runbook: what counts as PHI, where PHI is allowed to live, what to do when PHI accidentally appears in logs (the answer is a documented redaction script that runs against historical Sentry events plus an updated beforeSend hook), and which pieces of Charthealth's stack require a Business Associate Agreement. The clinical lead now has a single document she can hand to a partner clinic's compliance officer.
- 09
Added an automated PR check that runs `tsc`, ESLint with the architecture rules, both Vitest and Playwright suites, and a custom script that fails if a PR touches a PHI table without also updating the audit log mapping. The check runs in under three minutes; the team's merge-velocity went up because reviewers stopped having to manually verify these things on every PR.
“Cursor was giving us speed we couldn't cash in. Every fix undid something. Afterbuild Labs didn't tell us to stop using Cursor — they gave us rails so Cursor couldn't drift the codebase underneath us. We shipped to five hundred clinicians the month after.”
Outcome after the resolved rescue
Every metric below was measured directly — RLS coverage via pgTAP, webhook success via Stripe dashboards, response times via production APM, MRR via Stripe billing.
| Metric | Before | After |
|---|---|---|
| User-reported bugs per week | 11 | 1 |
| Test suite size / reliability | 312 tests, ~40% green CI | 88 tests, 100% green CI |
| PHI fields in Sentry events | 4 (names, DOB, dx codes) | 0 |
| Audit log coverage | 1 of 4 write paths | 4 of 4 (DB trigger) |
| Clinicians using the app | 12 (pilot) | 500 across 9 clinics |
| Take-home completion rate (new hires) | 0 of 3 | 2 of 2 |
| Avg time to fix a real bug | ~2 days (loop) | ~3 hours |
“We'd write the Cursor playbook first, not last. The team was still prompting in old habits through week one and generated two more regressions we then had to unwind. The lesson generalises: in a regression-loop rescue, behavior change has to come before architecture change, otherwise the architecture you just built starts drifting on day two.”
- →We'd involve a HIPAA compliance reviewer earlier. We caught the PHI-in-Sentry issue ourselves but we'd have preferred an external second set of eyes before shipping to nine clinics. A formal review on day three would have surfaced two additional minor issues — a cookie attribute and an audit-log retention setting — that we instead caught on day twelve.
- →We'd measure the bug-per-week baseline for two weeks before touching code. We have before/after numbers but the 'before' is the founder's recollection; a cleaner measurement would have been stronger proof. For the next regression-loop engagement we now require a baseline-measurement period before the project clock starts.
- →We'd ship the architecture documentation as a Loom-walkthrough alongside the Markdown file. The team reads docs unevenly; a 12-minute video tour of the new feature-slice boundaries would have onboarded the contractor and the two new hires faster than the written doc alone managed to.
How to replicate this Cursor rescue
The same engagement path runs across every healthtech (outpatient clinical workflow) rescue we take on. Start with the diagnostic, then route into the service tier that matches the breakage surface.
Similar healthtech (outpatient clinical workflow) rescues
Browse the full archive of Cursor and adjacent AI-builder rescue write-ups.
Related industry deep-dive
PHI-in-Sentry exposure, incomplete audit logs, and session-token hygiene are the recurring healthtech failure modes this regression-loop rescue surfaced. The vertical page walks the HIPAA-aligned production-readiness checklist we apply on every clinical workflow — from BAA scoping to audit-trail triggers and PHI-safe logging.
Got a broken Cursor app that looks like this one?
Send the repo. We'll tell you what it takes to ship — in 48 hours, fixed fee. Free diagnostic, no obligation.
Book free diagnostic →