Phase 3: The Technology

Building the AI Assistive Tool

Powered by Lived Experience, Driven by Advanced AI

Large Vision Model

Large Language Model

It is not enough to simply build a piece of technology; the technology must actively understand and adapt to the people it serves.

Explore The Technology

The Challenge

Why Medical AI Tools Fail in the Global South

Many medical AI tools fail in the Global South because they are built using sterile, lab-based datasets that cannot handle the noisy, high-pressure reality of an actual public clinic.

Our technology is different. By utilizing the community-validated data curated during our Discovery and Co-Conception phases, we built a highly specialized, bidirectional AI system designed explicitly for Ugandan healthcare.

Traditional AI: Lab-based, sterile data

Our AI: Real-world, community-validated

To process the complexity of Ugandan Sign Language (USL) and cross-reference it with infectious disease protocols, our technical architecture relies on a Dual-Engine Approach.

The Core Innovation

The Dual-Engine Approach

How It Works

Large Vision Model

Translating the Body

Reading more than just hands

Large Language Model

The Clinical Brain

Secure, localized medical prompting

Affect Recognition

Large Vision Model (LVM)

Translating the Body

The first half of our engine is the Large Vision Model (LVM), which serves as the "eyes" of the AI. When a Deaf patient signs their symptoms into the clinic's tablet, the LVM captures the movement using advanced 3D pose estimation, tracking the spatial relationship between the hands, arms, and torso from multiple angles.

But translating sign language requires more than just hand gestures—it requires reading emotion. Crucially, our LVM is programmed to track facial expression markers and non-manual signals to recognize affect and pain expressions.

Preventing Diagnostic Overshadowing:

Because the community explicitly mandated this during co-design, the LVM understands when a patient is signing frantically due to severe physical agony (e.g., an ectopic pregnancy). Instead of outputting a flat text translation that a doctor might misread as "psychiatric agitation," the LVM correctly tags and translates the high-stress emotion, ensuring the doctor understands the acute physical emergency.

3D Pose Estimation 21 key points tracked

Facial Markers 68 landmarks

Frame Rate 30 fps

Large Language Model (LLM)

The Clinical Brain

Once the LVM translates the physical signs into structured text, it feeds that data directly into the second half of our engine: the Large Language Model (LLM).

This is not a generic chatbot. Our conversational AI engine has been meticulously fine-tuned on domain-specific clinical datasets, including Uganda Ministry of Health guidelines, WHO protocols, and CDC screening workflows specifically for infectious diseases like malaria, tuberculosis (TB), and HIV/AIDS.

Secure Doctor Prompts:

The LLM takes the translated symptoms (e.g., "burning urination for three days") and aligns them with these official medical protocols. It then securely prompts the attending clinician with structured diagnostic advice, suggested lab tests, or follow-up triage questions. The LLM can also generate a medically sound response or question, which is translated back to the patient, enabling a seamless, two-way diagnostic conversation without needing a human interpreter.

Malaria Tuberculosis HIV/AIDS STIs Maternal Health

Uganda Ministry of Health

WHO Guidelines

CDC Protocols

The Foundation

The Dataset Engine

Built for the Real World

An AI is only as safe as the data it learns from. If an AI only learns from young, able-bodied actors signing slowly in a brightly lit studio, it will instantly crash when a frightened, elderly patient signs rapidly in a dimly lit, crowded clinic.

To prove this tool works in messy, real-world environments, our Dataset Engine was engineered for extreme diversity.

Diversity by design

Diverse Signers

The USL video dataset features a wide array of Deaf participants, capturing variations in age, gender, handedness, and regional dialects.

Age Variation

18-75 years

Gender Balance

50/50

Regional Dialects

8 regions

Stress-tested

Real-World Friction

We trained the AI to handle "distribution shifts" by intentionally introducing synthetic data augmentation—testing the camera against poor clinic lighting conditions, background visual noise, and varying signing speeds.

Poor Lighting

Visual Noise

Varying Speeds

By training the dual-engine system on the actual lived reality of Ugandan patients, we built an AI that doesn't just work in theory—it works in the triage room.

But even the safest technology can be weaponized if the wrong people control the data.

End-to-End System Architecture

Deaf Patient

Signs symptoms in USL

Large Vision Model

3D pose estimation + affect recognition

Large Language Model

Clinical protocol matching

Clinician

Receives structured guidance

Offline Mode: Critical Danger Signs processed locally

Discover How We Protect This Technology

Our Anticipatory Governance framework ensures ethical deployment.

Next: Phase 4: Anticipatory Governance (Protecting the Data) Return to Phase 2: Stress-Testing the Future