ASR Pipeline¶
UC-PROC-005: Queue ASR Job¶
Purpose: Schedule audio transcription.
| Property | Value |
|---|---|
| Actor | API Server |
| Trigger | POST /api/upload/audio |
| Priority | P1 |
Main Success Scenario:
1. Validate file (MP3/WAV, < 50MB)
2. Upload to S3 `audio/`
3. Push to `asr-queue`
4. Return HTTP 202 Accepted
Acceptance Criteria: 1. [ ] Accepts common audio formats 2. [ ] Rejects files > 50MB
UC-PROC-006: Execute Whisper Engine¶
Purpose: Run AI speech-to-text inference.
| Property | Value |
|---|---|
| Actor | GPU Worker |
| Trigger | Job in asr-queue |
| Priority | P1 |
Main Success Scenario:
1. Load Whisper model (Large-v3)
- Keep model loaded in memory if possible (warm start)
2. Run inference: `model.transcribe(audio_path)`
3. Extract `text` and `segments` (timestamps)
4. Update Patient Bundle `transcripts` array
5. Update Job status
Observability:
- Metric: asr_inference_time_seconds
- Log: {"event": "asr_complete", "duration": 4.5, "audio_len": 30}
Acceptance Criteria: 1. [ ] Inference time < Audio duration (Real-time factor < 1) 2. [ ] Preserves timestamps for word alignment
UC-PROC-011: Identify Speaker Turns¶
Purpose: Separate clinician vs patient speech for downstream analytics.
| Property | Value |
|---|---|
| Actor | Diarization Worker |
| Trigger | Whisper segments available |
| Priority | P1 |
Main Success Scenario:
1. Convert audio to 16kHz mono if needed
2. Run pyannote diarization to assign speaker labels per time slice
3. Merge adjacent slices with same label and duration < 500ms gap
4. Map speakers to roles (Clinician, Patient, Caregiver) using heuristic keyword detection
5. Update transcript segments with `speakerRole` and `confidence`
6. Emit `diarization_latency_seconds`
Acceptance Criteria: 1. [ ] Supports stereo and mono inputs 2. [ ] Accuracy > 85% on benchmark call set 3. [ ] Provides override endpoint for manual relabeling
UC-PROC-012: Generate Encounter Note¶
Purpose: Produce a structured SOAP note draft from diarized transcripts.
| Property | Value |
|---|---|
| Actor | Note Composer Service |
| Trigger | Diarization complete |
| Priority | P2 |
Main Success Scenario:
1. Split transcript into sections by speaker role
2. Prompt LLM with template (Subjective, Objective, Assessment, Plan)
3. Extract medications, vitals, and orders into structured JSON
4. Populate Note entity with draft text + structured payload
5. Send notification to clinician for review/attestation
6. Persist revision history for legal traceability
Acceptance Criteria: 1. [ ] Draft clearly labeled "Auto-generated" 2. [ ] Captures citations back to transcript timestamps 3. [ ] Provides API to reject or accept draft with comments