Research Projects
LiGHT research projects are designed to:
-
Produce tools, evidence, and methods that can be deployed in real-world, high-stakes, resource-constrained health settings.
-
Advance rigorous, cutting-edge AI research where state of the art is defined by robustness, reliability, and relevance under real-world clinical and infrastructural constraints.
-
Train interdisciplinary researchers who can work rigorously in high-stakes environments and design AI systems that are both scientifically sound and practically implementable.
All projects are embedded in large-scale research programs, platforms, and clinical studies, and are structured into well-defined subcomponents that can support semester projects, MSc and PhD theses, or longer-term research engagements, with close supervision and team-based collaboration.
MOOVE: Massive Open Online Validation and Evaluation
MOOVE is a large-scale, participatory evaluation platform for clinical AI systems, designed to produce evidence that is directly usable in real-world health systems. It addresses a core gap in medical AI: the lack of rigorous, context-aware evaluation of large language models under the clinical, linguistic, and resource constraints where they are intended to be used.
MOOVE is built with clinicians and health institutions across Sub-Saharan Africa, South Asia, Latin America, and Europe, and focuses on high-stakes decision-making in both resource-constrained and high-resource settings.
RESEARCH QUESTIONS/TASKS
This project is centered on actively building and operating a production-grade, web-based evaluation platform. You will work within a multidisciplinary, international team spanning engineering, clinical practice, and health policy, and you will:
– Develop and extend the MOOVE platform across front-end and back-end components
– Adapt the platform for use in clinical trials and regulated research settings
– Design innovative benchmarking methodologies and evaluation workflows
– Implement participatory evaluation, leaderboards, and incentive mechanisms
– Ensure platform stability, security, and high standards of design and engineering
– Track and incorporate evolving evaluation standards across cultures, languages, and health system contexts
– Participate in validatathons and contribute to interdisciplinary research outputs
SKILLS YOU WILL BUILD / NEED
– Front-end and/or back-end web development skills
– Experience working with TypeScript and modern web frameworks (Next.js / React)
– Experience with Firebase (authentication, backend services, data flow)
– Front-end ↔ back-end integration for data-intensive web applications
– Experience with data analysis, experimentation, or research-oriented software
– Interest in user-centered design for expert and clinical audiences
– Ability to collaborate effectively in multidisciplinary, international teams
– Familiarity with AI systems, evaluation, or clinical research is helpful but not required
RetroMoove
RetroMOOVE is a research project focused on building large-scale evaluation benchmarks for medical AI using retrospectively collected, real-world clinical data. The project uses routinely collected healthcare data from multiple countries, including Tanzania, Rwanda, Kenya, and others, to construct realistic evaluation settings that reflect actual clinical workflows, data quality, and decision-making constraints.
By grounding evaluation in retrospective real-world data, RetroMOOVE enables systematic comparison of large language models against established clinical baselines, moving beyond synthetic prompts or isolated benchmarks.
The project currently includes large-scale retrospective datasets across multiple clinical domains and geographies:
– Pediatric primary health care in Tanzania and Rwanda
– Maternal and child health across Kenya, India, and other Sub-Saharan African settings
– Mental health using retrospective datasets from Kenya and South Africa
RESEARCH QUESTIONS/TASKS
This project combines dataset construction, modeling, and evaluation methodology. You will:
– Curate and structure large-scale retrospective clinical datasets from multiple countries
– Design evaluation benchmarks based on real clinical tasks and outcomes
– Develop deterministic predictive models as baseline comparators
– Implement rule-based clinical algorithms reflecting existing standards of care
– Compare large language models against statistical and rule-based baselines
– Explore hybrid model architectures that combine deterministic models with LLMs to ground reasoning and improve reliability
– Study model behavior, generalization, and failure modes across countries and health system contexts
SKILLS YOU WILL BUILD / NEED
– Experience with Python and data analysis
– Familiarity with machine learning or statistical modeling
– Interest in evaluation, benchmarking, and model comparison
– Experience working with real-world clinical or observational data is helpful
– Ability to reason about modular model architectures and comparative evaluation
Meditron-4: Clinical feedback alignment and SOTA dev
Meditron-4 is the next iteration of Meditron, focused on advancing clinically aligned, guideline-faithful, and context-aware medical language modeling.
The project aims to develop open-source fine-tuning and evaluation pipelines for medical LLMs, and to produce a highly clinically aligned Meditron model built on leading open medical and general base models. In parallel, the work explores small, offline-capable models (e.g. MedGemma 4B, LFM-2) to support deployment in resource-constrained settings.
RESEARCH QUESTIONS/TASKS
– Developing open-source fine-tuning pipelines for clinically aligned medical language models
– Designing evaluation pipelines that assess clinical behavior and guideline adherence
– Improving clinical alignment through feedback-driven and data-driven methods
– Producing a strong, reproducible Meditron-4 model based on open foundations
– Exploring small, offline-capable models for practical deployment in low-resource environments
SKILLS YOU WILL BUILD / NEED
– Experience with machine learning or large language models
– Familiarity with model fine-tuning and evaluation workflows
– Proficiency in Python and modern ML frameworks
Meditron reasoning
This project focuses on improving clinical reasoning capabilities in Meditron by studying explicit reasoning mechanisms in medical language models. In high-stakes clinical settings, correct answers are not sufficient on their own; models must demonstrate coherent, grounded reasoning that aligns with clinical logic, evidence, and guidelines. This project treats reasoning as a measurable, evaluable property rather than an emergent side effect.
The work explores how reasoning-oriented training methods can improve decision consistency, error detection, and robustness in medical language models.
RESEARCH QUESTIONS/TASKS
This project combines model development with careful evaluation of reasoning behavior. You will:
– Design and integrate reasoning-focused training objectives into Meditron
– Apply unsupervised or self-supervised reinforcement learning approaches
– Study how different training signals affect reasoning consistency and failure modes
– Evaluate reasoning performance on clinically relevant decision-making tasks
– Analyze trade-offs between reasoning depth, accuracy, and computational cost
– Contribute to reproducible training pipelines and evaluation protocols
SKILLS YOU WILL BUILD / NEED
– Strong background in machine learning and deep learning
– Experience training and evaluating large language models
– Familiarity with reinforcement learning or optimization methods is helpful
– Proficiency in Python and modern ML frameworks
– Ability to analyze model behavior beyond surface-level accuracy
Polyglot Meditron
Most medical language models are trained and evaluated primarily in English, despite the fact that clinical care is delivered in many languages, often in settings where English is not the working language. In low-resource and multilingual health systems, this mismatch limits the usefulness and safety of AI tools.
PolyglotMeditron focuses on extending Meditron to support multilingual clinical reasoning, with an emphasis on low-resource languages and real clinical usage. The project studies how medical meaning, terminology, and decision logic transfer across languages, and where existing multilingual models fail in medical contexts.
RESEARCH QUESTIONS/TASKS
This project combines model development with targeted evaluation of multilingual behavior. You will:
– Improve Meditron’s performance in non-English languages
– Focus on low-resource and underrepresented languages relevant to clinical practice
– Extend support for both written and spoken language inputs
– Develop and evaluate handling of non-English medical terminology and domain-specific usage
– Analyze the limitations of polyglot base models when applied to medical tasks
– Evaluate multilingual models on clinically grounded tasks across languages
SKILLS YOU WILL BUILD / NEED
– Background in machine learning or natural language processing
– Experience working with language models
– Familiarity with speech processing or multilingual NLP is helpful
– Interest in multilingual modeling in clinical and global health settings
– Proficiency in Python and modern ML tooling
MultiMeditron
MultiMeditron focuses on making Meditron multimodal, allowing users to provide medical images in addition to text. The aim is to extend Meditron beyond text-only inputs while preserving its existing structure and capabilities.
The work is twofold. First, the Meditron codebase is adapted to support a multimodal architecture and new input modalities (currently limited to images). Second, modality-specific “expert” models are developed and improved to process these inputs and generate embeddings that are fed into Meditron.
RESEARCH QUESTIONS/TASKS
– Adapting the Meditron codebase to support a modular, extensible multimodal architecture
– Integrating new modalities into Meditron beyond text, starting with medical images
– Designing, training, and improving modality-specific expert models
– Developing embedding strategies for fusing multimodal representations within Meditron
– Studying how multimodal inputs affect clinical reasoning, performance, and robustness
– Evaluating multimodal Meditron models on clinically relevant tasks
SKILLS YOU WILL BUILD / NEED
– Machine learning and deep learning fundamentals
– Experience with neural network architectures, particularly vision or multimodal models
– Proficiency in Python and common ML frameworks
Foundation Models for Voice-Based Clinical Assessment
Voice carries clinically meaningful information relevant to respiratory disease, mental health, neurological conditions, and general clinical assessment. While large speech foundation models show strong general capabilities, their reliability, robustness, and data requirements in real clinical settings remain poorly characterized. This project focuses on rigorously evaluating speech foundation models for clinical voice-based assessment, with particular attention to performance under realistic, resource-constrained conditions.
The goal is to determine when such models are useful, where they fail, and what constraints must be addressed before they can be safely integrated into health systems.
RESEARCH QUESTIONS/TASKS
This project is centered on systematic benchmarking and evaluation. You will:
– Benchmark pretrained speech foundation models on clinically relevant voice tasks
– Evaluate model performance across multiple datasets, populations, and recording conditions
– Analyze robustness to noise, speaker variability, language, and device quality
– Study data efficiency and performance under limited labeled data
– Compare models using reproducible evaluation protocols
– Document failure modes, limitations, and deployment-relevant trade-offs
SKILLS YOU WILL BUILD / NEED
– Machine learning and deep learning fundamentals
– Proficiency in Python (PyTorch preferred)
– Experience with experimental design and model evaluation