Skip to content

Quality & Safety Use Cases (QAS)

UC-QAS-001: Record Model Failures

Purpose: Log AI failures (hallucinations, crashes) for review.

Property Value
Actor QA Monitoring Service
Trigger NLP/ASR error or doctor rejection
Priority P0

Main Success Scenario:

1. Detect failure: crash, low confidence, doctor rejection
2. Capture failure context (input, output, model version)
3. Store in failures database
4. Emit alert if failure rate exceeds threshold
5. Create review task for QA team

Acceptance Criteria: 1. [ ] All model errors logged 2. [ ] Failure analysis dashboard available 3. [ ] Alerts trigger at 5% failure rate


UC-QAS-002: Perform Clinical Safety Review

Purpose: Manual review of high-risk AI outputs.

Property Value
Actor Clinical Safety Officer
Trigger Random sampling or flagged case
Priority P0

Main Success Scenario:

1. Reviewer accesses safety review queue
2. Compare AI output vs source audio/transcript
3. Check for clinical accuracy
4. Flag errors: Medication errors, diagnosis errors
5. Provide feedback to ML team
6. Approve or escalate case

Acceptance Criteria: 1. [ ] 5% random sampling of all encounters 2. [ ] Critical errors escalated within 24h 3. [ ] Review metrics tracked


UC-QAS-003: Track Audit Violations

Purpose: Monitor compliance violations (access, consent).

Property Value
Actor Audit Monitor Service
Trigger Continuous audit log analysis
Priority P1

Main Success Scenario:

1. Monitor audit logs for violations:
   - Unauthorized access attempts
   - Consent violations
   - Data export anomalies
2. Flag violations in compliance dashboard
3. Alert security team for critical violations
4. Generate monthly compliance report

Acceptance Criteria: 1. [ ] Real-time violation detection 2. [ ] HIPAA/DPDP compliance checks 3. [ ] Audit-ready reporting


UC-QAS-004: Model Drift Detection

Purpose: Detect degradation in ASR/NLP performance.

Property Value
Actor ML Ops Service
Trigger Weekly performance analysis
Priority P1

Main Success Scenario:

1. Compare current week metrics vs baseline:
   - ASR Word Error Rate (WER)
   - NLP F1 scores
   - Doctor edit frequency
2. If drift detected (>5% degradation):
   - Alert ML team
   - Trigger model retraining evaluation
3. Log drift metrics

Acceptance Criteria: 1. [ ] Drift detection within 1 week 2. [ ] Automated alerts configured 3. [ ] Historical metrics tracked