In March 2025, a mid-size crypto custodian lost $12 million in a single weekend. Not to a sophisticated zero-day exploit. Not to a nation-state attacker. To a compromised MPC shard that started signing transactions outside business hours while the on-call engineer was asleep.
The alert fired at 2:47am. The engineer saw it at 6:15am. By then, 340 unauthorized transactions had already cleared.
This is the pattern behind most institutional crypto losses. Not brilliant hacking. Just operational gaps in key infrastructure monitoring that nobody has bothered to close.
The Manual Monitoring Problem
MPC (Multi-Party Computation) key infrastructure is now the standard for institutional crypto custody. The premise is straightforward: split a private key into multiple shards, distribute them across parties, and require a threshold of shards to cooperate for any signing operation. No single point of compromise.
The cryptography works. The operations around it don't.
Most custody operations today monitor MPC infrastructure the same way they monitor traditional IT systems: dashboards, Slack alerts, PagerDuty rotations, and humans staring at screens. This approach has three fundamental flaws:
- Response latency is measured in minutes, not milliseconds. A compromised shard can sign hundreds of transactions before a human reads the alert, assesses the situation, and takes action. The mean time to acknowledge a critical alert in crypto operations is 15+ minutes. Attackers need seconds.
- Alert fatigue kills vigilance. The average custody SOC generates 2,000+ alerts per day. Most are false positives. After weeks of dismissing benign threshold events, the team stops looking closely. The real incident hides in the noise.
- Coverage gaps are inevitable. People take vacations. On-call rotations have handoff delays. Timezone coverage requires expensive 24/7 staffing. A $400K/year SOC team still leaves gaps every time someone walks to the bathroom.
Why MPC Signing Patterns Break Rule-Based Alerting
Traditional security monitoring relies on static rules. If event X exceeds threshold Y, fire alert Z. This works for web application firewalls and network intrusion detection. It fails catastrophically for MPC key operations.
Here's why: MPC signing patterns are inherently variable, contextual, and temporal.
Signing Frequency Anomalies
A treasury wallet might sign 50 transactions on a Monday (payroll day) and zero on a Sunday. Setting a static threshold at 100 misses the Sunday attack entirely. Setting it at 10 triggers every Monday. The "normal" signing frequency depends on the day, the hour, the business cycle, and the wallet type.
Off-Hours Operations
A signing event at 3am UTC might be perfectly normal for a Singapore-based trading desk, but catastrophic for a New York custodian. Static rules can't model the nuance. And as teams become more distributed, the concept of "business hours" becomes meaningless for any fixed rule set.
Threshold Scheme Deviations
In a 3-of-5 MPC setup, the specific combination of shards that participate matters as much as the transaction itself. If shards 1, 2, and 3 always sign together but suddenly shard 5 replaces shard 2, that's a signal. Not necessarily an attack, but worth investigating. Rule-based systems don't track shard combination patterns because the combinatorial space is too large for static rules.
Velocity and Sequence Patterns
An attacker who compromises a shard doesn't necessarily sign unusual transactions. They sign normal-looking transactions at unusual rates or in unusual sequences. Ten $50K transfers look less suspicious than one $500K transfer, but the aggregate is the same. Detecting this requires understanding the baseline rhythm of each key's behavior, something a rule engine can't learn.
Rule-based alerting asks: "Did this event match a known bad pattern?"
Autonomous monitoring asks: "Does this behavior match what this specific key normally does?"
The second question catches threats the first one never will.
What Autonomous Monitoring Actually Means
The term "autonomous" gets thrown around a lot. In the context of MPC key monitoring, it means three specific things:
1. Behavioral Baseline Learning. The system observes each key's signing patterns over time: who signs, when, from which IPs, at what frequency, in which shard combinations, and for what transaction types. It builds a per-key behavioral model, not a one-size-fits-all ruleset. When a signing event deviates from that key's specific baseline, the system flags it with a confidence score, not a binary alert.
2. Real-Time Anomaly Detection. Every signing event is evaluated against the baseline in real-time, within milliseconds of the event occurring. This isn't batch processing on a 5-minute interval. It's streaming analysis. A compromised shard that starts signing at 3am triggers detection on the first anomalous signature, not after 50 transactions have already cleared.
3. Automated Incident Response. This is the critical difference. Detection without response is just a faster alert. Autonomous monitoring includes pre-configured response actions: freeze the signing quorum, rotate the compromised shard, notify the response team, and generate a forensic timeline, all within sub-second latency and without waiting for human approval.
The Response Time Gap
Consider a concrete scenario. An attacker gains access to one shard in a 3-of-5 MPC setup through a phished key custodian. They initiate signing operations at 2:47am local time.
With manual monitoring:
- Alert fires at 2:47am
- PagerDuty pages on-call engineer at 2:48am
- Engineer wakes up, checks phone at 3:02am
- Opens laptop, connects to VPN at 3:08am
- Reviews dashboard, confirms incident at 3:15am
- Initiates key rotation at 3:22am
- Total gap: 35 minutes. Hundreds of transactions cleared.
With autonomous monitoring:
- Anomalous signing detected at 2:47:00.003am
- Behavioral model flags shard combination anomaly (confidence: 94%)
- Automated response: signing quorum frozen at 2:47:00.089am
- Compromised shard isolated, rotation initiated at 2:47:00.120am
- Response team notified with full forensic timeline at 2:47:01am
- Total gap: 120 milliseconds. Zero unauthorized transactions.
That's not an incremental improvement. It's a categorical difference in the outcome.
The Architecture of Autonomous MPC Monitoring
Building this requires four components working in concert:
Event Ingestion Layer. A high-throughput stream processor that captures every signing event, access log, policy change, and infrastructure event in real-time. This isn't an API you poll every 30 seconds. It's a push-based event stream with sub-10ms delivery.
Behavioral Engine. A per-key model that tracks signing patterns across multiple dimensions: temporal (when), spatial (from where), relational (which shard combinations), and transactional (what amounts, to which addresses). The model updates continuously, adapting to legitimate changes in business operations without requiring manual rule updates.
Decision Engine. An AI-driven layer that evaluates anomalies against the behavioral model and the current operational context. A signing event at 3am during a known maintenance window is different from one at 3am on a random Tuesday. The decision engine understands context, not just data.
Response Orchestrator. A pre-configured action engine that executes incident response playbooks in milliseconds. Freeze, rotate, notify, document. Each response is logged, auditable, and reversible. No human in the loop for time-critical actions. Full human oversight for post-incident review.
Why This Matters Now
Three trends are converging that make autonomous MPC monitoring urgent rather than aspirational:
Institutional adoption is accelerating. Major banks, asset managers, and payment processors are moving crypto custody in-house. They're deploying MPC infrastructure at scale, but their operational security practices haven't caught up. The gap between cryptographic security and operational security is widening.
Attack sophistication is increasing. Attackers are no longer trying to break the cryptography. They're targeting the humans and processes around it. Social engineering, insider threats, compromised DevOps pipelines. These attacks look like normal operations until it's too late for a human to intervene.
Regulatory pressure is building. Custody regulations increasingly require demonstrable real-time monitoring and incident response capabilities. "We have a PagerDuty rotation" won't satisfy auditors much longer. Autonomous monitoring with full audit trails becomes a compliance requirement, not a nice-to-have.
MPC solved the key storage problem. Nobody solved the key operations problem. Manual monitoring with static rules leaves a response gap measured in minutes. Attackers operate in milliseconds.
Autonomous monitoring closes that gap.
Monitor your key infrastructure 24/7
Join the KeyPulse waitlist for early access to autonomous AI monitoring for MPC key operations.