When Emergency Tools Fail During Emergencies
My first week at Citizen, I watched our emergency response system crash during an active crisis.
Not slow down. Not glitch. Crash.
Nearly 500,000 New Yorkers depended on Citizen for real-time safety alerts about incidents happening near them. Police shootings. Building fires. Active emergencies where getting accurate information quickly could help people make safer decisions.
And our internal tools—the backbone that turned police radio chatter into life-saving notifications—were collapsing under their own weight.
The Dangerous Workarounds
Here's how broken the system really was:
Operators couldn't see incidents on a map. They couldn't preview what notifications would look like before sending them to thousands of people. They couldn't even tell if they were about to create duplicate alerts for the same emergency.
So they developed workarounds. Dangerous ones.
When they needed to send an update about an incident, they would change the incident's title to the new message, trigger a notification, then quickly change the title back—praying they wouldn't lose track of which incident they were working on.
They were typing addresses by hand, hoping they got NYC locations right before alerting entire neighborhoods. They had no way to search through disconnected audio clips to find related incidents. No map. No preview. No control.
Just text fields feeding alerts to half a million people who trusted these notifications with their safety.
The operators—the people whose work made the entire business possible—had started to distrust their own software.
Rebuilding While the World Watches
When they hired me as Head of Internal Tools — I faced a challenge that at first, kept me up at night: How do you completely rebuild mission-critical systems that absolutely cannot go down?
The technical problems were brutal:
- Memory leaks crashed browsers multiple times during active 8-12 hour operator shifts
- Shared global state for everything meant to make the tools real-time, actually made processing impossible
- A convoluted, disorganized Firebase database couldn't handle the write volume and complexity
- No fault tolerance—when something broke, everything broke
But here's what made it really hard: I had to rebuild three interconnected systems simultaneously while they processed up to 1,000 incidents daily across NYC. Emergency response doesn't pause for system upgrades.
Zero downtime wasn't a nice-to-have. It was mission-critical for an app trusted by nearly half a million people during their most stressful moments.
Trust First, Technology Second
I could have started with the technical architecture. Built better databases, fixed the memory leaks, implemented real-time sync.
Instead, I started with the human beings.
The operators felt like outsiders at their own company, even though the entire business depended on their work. Management had given them impossible workloads with broken tools and expected them to "make it work."
My approach: Shadow their actual shifts. Work overnight. Fight for them in executive meetings. Make them collaborators, not just users.
I started actively embedding myself in the operations center during real emergencies—building shootings, major fires, city-wide incidents. I needed to understand not just their workflow, but their psychology under pressure.
What I learned: Every interface decision could save or cost precious seconds when operators were processing urgent, traumatic situations under intense pressure.
From Text Fields to Mission Command
Armed with deep operational understanding, I rebuilt the incident system from the ground up:
The Map Revolution: Instead of typing addresses and hoping, operators now had Google geocoding with map preview. They could see exactly where incidents were happening and verify locations before alerting thousands. Operators could change location by address or by picking up an incident and moving it on the map — and then getting to decide whether a notification needed to go out.
Smart Search: Fuzzy search across updates, police precincts, addresses, and severity levels. Audio clips came in fragmented, but now operators could quickly connect the pieces.
Notification Control: Custom notification system with iPhone-accurate previews. Operators could finally see exactly what users would receive and adjust the message, radius, and severity before sending.
Operational Intelligence: Full audit trails tracking who created incidents, wrote updates, and made changes. Plus Time to Incident (TTI) tracking so management could see performance data instead of flying blind.
Collaborative Features: Tweet integration (management's request), edit-without-notification for fixing mistakes, and dozens of workflow improvements that turned daily frustrations into smooth operations.
Technical Foundation: Migrated from Firebase to PostgreSQL with real-time WebSocket sync. Built fault-tolerant architecture with automatic failover. Achieved sub-200ms updates across all workstations with zero crashes even during city-wide emergencies.
The Numbers Don't Lie
Before the rebuild:
- System crashes: 2-3 failures per day during peak periods
- Incident creation: 3-5 minutes average per incident
- Duplicate alerts: 15-20% of incidents triggered multiple alerts
- User base: 100,000 monthly active users
After the rebuild:
- 99.992% uptime: Zero crashes even during major emergencies
- Incident creation: 30-60 seconds average per incident
- Duplicate alerts: Less than 2% by the time we finished
- User base: 500,000 monthly active users
"Everyone is SO. EXCITED."
The metrics tell part of the story. But the real transformation was human.
Before: "I spend more time refreshing frozen screens than creating incidents." – Operator
After: "Now I can see the whole app in one place — it's like having a real [redacted] command center." – Operator
The Slack messages started rolling in:
- "everyone is SO. EXCITED. about new signal."
- "drag and drop is life-changing"
Operators who had been ready to quit became our strongest advocates. They went from feeling like outsiders to collaborating with the CEO and executive team on product decisions.
Most importantly: They could finally focus on their mission of helping people in crisis, instead of wrestling with broken software.
The Bigger Picture
This incident command center was part of a complete emergency response suite I rebuilt at Citizen:
- Emergency audio processing - Real-time police radio analysis
- Video monitoring system - Live incident verification
Together, these tools transformed Citizen's emergency response from prototype-stage experiments into production-ready systems that earned operators' trust while serving nearly 500,000 users across NYC.
When your users trust your app during emergencies, every decision matters. But the most important decision? Starting with the people doing the work.