Skip to content

AI & ML Overview

Document Purpose: This document provides an overview of AI/ML capabilities in the Entheory.AI platform, including OCR, ASR, NLP, and future AI features.


Executive Summary

Entheory.AI uses AI/ML to transform unstructured clinical data (scanned documents, audio recordings, free-text notes) into structured, searchable patient records. Our models are optimized for India-first deployment with Hindi + English support.

Related Documentation: - OCR & ASR Details – Engine selection and configuration - Safety & Evaluation – Clinical safety and model validation - MCP OCR Servers – OCR architecture trade-offs


AI Processing Pipeline

sequenceDiagram
    participant U as User/Clinician
    participant API as API Gateway
    participant Q as Job Queue
    participant OCR as OCR Worker
    participant ASR as ASR Worker
    participant NLP as NLP Pipeline
    participant B as Patient Bundle

    rect rgb(255, 240, 245)
        Note over U,B: Document Processing Flow
        U->>API: Upload PDF document
        API->>Q: Enqueue OCR job
        Q->>OCR: Process document
        OCR->>OCR: Language detection
        OCR->>OCR: Tesseract inference
        OCR->>NLP: Extract entities
        NLP->>NLP: NER (medications, diagnoses)
        NLP->>B: Update patient bundle
    end

    rect rgb(240, 255, 245)
        Note over U,B: Audio Processing Flow
        U->>API: Upload audio recording
        API->>Q: Enqueue ASR job
        Q->>ASR: Process audio
        ASR->>ASR: Whisper transcription
        ASR->>ASR: Speaker diarization
        ASR->>NLP: Generate SOAP note
        NLP->>NLP: Structure clinical entities
        NLP->>B: Update patient bundle
    end

    B-->>API: Return updated data
    API-->>U: Display in UI

1. AI/ML Capabilities by Use Case

1.1 Processing Pipelines (12 Use Cases)

Use Case ID Name AI/ML Component
PROC-001 Queue OCR Job Job orchestration
PROC-002 Detect Document Language Language detection model
PROC-003 Execute Tesseract Engine OCR inference
PROC-004 Process OCR Output Post-processing
PROC-007 Classify Document Type Document classification
PROC-008 Redact Sensitive Entities NER for PII detection
PROC-009 Extract Structured Fields Field extraction
PROC-010 Summarize Document Content LLM summarization
PROC-005 Queue ASR Job Job orchestration
PROC-006 Execute Whisper Engine ASR inference
PROC-011 Identify Speaker Turns Speaker diarization
PROC-012 Generate Encounter Note LLM note generation

1.2 NLP/NLU Pipelines (6 Use Cases)

Use Case ID Name AI/ML Component
NLP-101 Generate Structured SOAP Notes LLM structuring
NLP-102a Extract Medications (RxNorm) Medical NER
NLP-102b Extract Diagnoses (ICD-10) Medical NER
NLP-102c Extract Procedures & Symptoms Medical NER
NLP-103 Summarization + Noise Filtering LLM summarization
NLP-104 EMR Field Mapping Entity linking

1.3 Oncology AI (10+ Use Cases)

Use Case ID Name AI/ML Component
ONC-001 Extract Tumor Location Medical NER
ONC-002 Extract Histopathology Findings Pathology NER
ONC-003 Extract Cancer Stage (TNM) Staging extraction
ONC-011 Detect RECIST Lesions Radiology AI
ONC-014 Auto-score Response RECIST classifier
ONC-040 Parse NGS Reports Genomics NER
ONC-042 Map to Actionable Therapies Knowledge graph

1.4 Imaging AI (2 Use Cases)

Use Case ID Name AI/ML Component
IMG-015 AI Inference Scheduling Vision model orchestration
ONC-012 Track Lesion Progression Lesion tracking AI

2. Model Stack

2.1 OCR (Optical Character Recognition)

Component Technology Purpose
Primary Engine Tesseract 5 Open-source, Hindi + English
Alternative PaddleOCR Higher accuracy for complex layouts
Cloud Fallback Google Cloud Vision High-confidence fallback for low-quality scans
Language Packs Hindi, English, Tamil (planned) Bilingual medical documents

Use Cases: PROC-001 through PROC-010

2.2 ASR (Automatic Speech Recognition)

Component Technology Purpose
Primary Engine Whisper (Large V3) Multi-lingual, code-switching support
Diarization PyAnnote Speaker turn identification
Medical Vocabulary Custom fine-tuning Medical terminology accuracy
Noise Handling DeepFilterNet Audio enhancement pre-processing

Use Cases: PROC-005, PROC-006, CAP-001

2.3 NLP/NLU

Component Technology Purpose
Medical NER BioBERT / MedSpaCy Entity extraction (drugs, diagnoses)
Code Mapping Custom + UMLS RxNorm, ICD-10, SNOMED linking
Summarization LLM (GPT-4 / Claude / Gemini) Document and encounter summaries
SOAP Generation LLM + Templates Structured clinical notes
Knowledge-Augmented LLM + Medical Ontologies High-accuracy inference on small models (4B)

2.5 Strategic Architecture: The Efficiency Frontier

Entheory.AI prioritizes Knowledge-Augmented Generation (KAG) to solve the "LLM Hallucination" problem while maintaining computational efficiency:

  • Ontology Alignment: Instead of relying on raw 1T+ parameter models, we use specialized 4B parameter models grounded in Medical Knowledge Graphs (SNOMED CT, ICD-11).
  • Efficiency Benchmarks: This architecture achieves 88-90% accuracy on clinical tasks—par with models 100x larger—enabling deployment in low-resource/edge environments.
  • Medical Hierarchy: Context is provided via structured medical hierarchies rather than simple text retrieval, ensuring clinical relevance and safety.

Use Cases: NLP-101, ONC-002

2.4 Oncology-Specific

Component Technology Purpose
TNM Extraction Rule-based + NER Cancer staging
Biomarker Analysis Pattern matching + LLM IHC panel interpretation
Genomics Parsing Custom VCF parser NGS variant extraction
RECIST Scoring Rule-based Treatment response assessment

Use Cases: ONC-001 through ONC-062


3. India-Specific Optimizations

3.1 Language Support

Language OCR ASR NLP Status
English Production
Hindi Production
Hinglish (Code-switch) 🔄 Beta
Tamil 🔄 🔄 🔄 Planned Q2 2025
Telugu 🔄 🔄 🔄 Planned Q3 2025

3.2 Medical Terminology

  • Drug Names: Mapped to Indian generic brands + RxNorm
  • Diagnoses: ICD-10 with India-specific codes
  • Procedures: SNOMED + India-specific procedure codes
  • Abbreviations: Common Indian clinical abbreviations

Use Cases: IN-ONC-003, IN-ONC-004


4. Model Lifecycle

4.1 Training & Fine-Tuning

Use Case ID Name Purpose
ML-001a Curate Training Dataset Data preparation
ML-001b Execute Fine-tuning Run Model training
ML-002 Dialect Evaluation & Benchmarking Performance testing
ML-003 Continuous Quality Feedback Loop RLHF pipeline

4.2 Deployment & Monitoring

Use Case ID Name Purpose
OPS-302 Monitor Inference Time & Failures Performance tracking
QAS-001 Record Model Failures Error tracking
QAS-004 Model Drift Detection Quality monitoring
OPS-303 Human-in-the-Loop Correction Feedback capture

5. Performance Targets

Model Metric Target Current
OCR (English) Character accuracy >95% ~93%
OCR (Hindi) Character accuracy >85% ~82%
ASR (English) Word Error Rate <10% ~8%
ASR (Hindi) Word Error Rate <15% ~14%
ASR (Hinglish) Word Error Rate <20% ~18%
NER (Medications) F1 Score >90% ~88%
NER (Diagnoses) F1 Score >85% ~84%
TNM Staging Accuracy >90% ~87%

6. AI Safety & Transparency

6.1 Design Principles

  1. No Black Box Decisions: All AI outputs require clinician review before action
  2. Confidence Scores: Low-confidence outputs flagged for manual review
  3. Provenance: Every AI-generated field traceable to source document
  4. Audit Trail: All AI inferences logged with model version

6.2 Human-in-the-Loop

Stage AI Role Human Role
OCR/ASR Generate text Review low-confidence segments
NER Suggest entities Confirm/correct before EMR push
SOAP Notes Draft structure Approve before finalization
Alerts Flag potential issues Acknowledge and act

See: Safety & Evaluation for detailed safety protocols


7. Future Roadmap

Phase 1: Current (MVP)

  • ✅ OCR (English + Hindi)
  • ✅ ASR (English + Hindi)
  • ✅ Basic NER (medications, diagnoses)
  • ✅ SOAP note generation

Phase 2: Near-Term (6-12 months)

  • 🔄 Regional language support (Tamil, Telugu)
  • 🔄 Improved Hinglish handling
  • 🔄 Document summarization
  • 🔄 Clinical trial eligibility screening

Phase 3: Future (12-24 months)

  • 📋 Cohort selection for research
  • 📋 Predictive analytics (outcome forecasting)
  • 📋 Radiology AI (lesion detection)
  • 📋 Drug interaction prediction

Document Owner: AI/ML Engineering Team
Last Updated: 2024-12-09
Next Review: Quarterly (aligned with model releases)