Skip to content

ASR Pipeline

UC-PROC-005: Queue ASR Job

Purpose: Schedule audio transcription.

Property Value
Actor API Server
Trigger POST /api/upload/audio
Priority P1

Main Success Scenario:

1. Validate file (MP3/WAV, < 50MB)
2. Upload to S3 `audio/`
3. Push to `asr-queue`
4. Return HTTP 202 Accepted

Acceptance Criteria: 1. [ ] Accepts common audio formats 2. [ ] Rejects files > 50MB


UC-PROC-006: Execute Whisper Engine

Purpose: Run AI speech-to-text inference.

Property Value
Actor GPU Worker
Trigger Job in asr-queue
Priority P1

Main Success Scenario:

1. Load Whisper model (Large-v3)
   - Keep model loaded in memory if possible (warm start)
2. Run inference: `model.transcribe(audio_path)`
3. Extract `text` and `segments` (timestamps)
4. Update Patient Bundle `transcripts` array
5. Update Job status

Observability: - Metric: asr_inference_time_seconds - Log: {"event": "asr_complete", "duration": 4.5, "audio_len": 30}

Acceptance Criteria: 1. [ ] Inference time < Audio duration (Real-time factor < 1) 2. [ ] Preserves timestamps for word alignment


UC-PROC-011: Identify Speaker Turns

Purpose: Separate clinician vs patient speech for downstream analytics.

Property Value
Actor Diarization Worker
Trigger Whisper segments available
Priority P1

Main Success Scenario:

1. Convert audio to 16kHz mono if needed
2. Run pyannote diarization to assign speaker labels per time slice
3. Merge adjacent slices with same label and duration < 500ms gap
4. Map speakers to roles (Clinician, Patient, Caregiver) using heuristic keyword detection
5. Update transcript segments with `speakerRole` and `confidence`
6. Emit `diarization_latency_seconds`

Acceptance Criteria: 1. [ ] Supports stereo and mono inputs 2. [ ] Accuracy > 85% on benchmark call set 3. [ ] Provides override endpoint for manual relabeling


UC-PROC-012: Generate Encounter Note

Purpose: Produce a structured SOAP note draft from diarized transcripts.

Property Value
Actor Note Composer Service
Trigger Diarization complete
Priority P2

Main Success Scenario:

1. Split transcript into sections by speaker role
2. Prompt LLM with template (Subjective, Objective, Assessment, Plan)
3. Extract medications, vitals, and orders into structured JSON
4. Populate Note entity with draft text + structured payload
5. Send notification to clinician for review/attestation
6. Persist revision history for legal traceability

Acceptance Criteria: 1. [ ] Draft clearly labeled "Auto-generated" 2. [ ] Captures citations back to transcript timestamps 3. [ ] Provides API to reject or accept draft with comments