Research Projects

LiGHT research projects are designed to:

Produce tools, evidence, and methods that can be deployed in real-world, high-stakes, resource-constrained health settings.
Advance rigorous, cutting-edge AI research where state of the art is defined by robustness, reliability, and relevance under real-world clinical and infrastructural constraints.
Train interdisciplinary researchers who can work rigorously in high-stakes environments and design AI systems that are both scientifically sound and practically implementable.

All projects are embedded in large-scale research programs, platforms, and clinical studies, and are structured into well-defined subcomponents that can support semester projects, MSc and PhD theses, or longer-term research engagements, with close supervision and team-based collaboration.

MOOVE: Massive Open Online Validation and Evaluation

MOOVE

MOOVE is a large-scale, participatory evaluation platform for clinical AI systems, designed to produce evidence that is directly usable in real-world health systems. It addresses a core gap in medical AI: the lack of rigorous, context-aware evaluation of large language models under the clinical, linguistic, and resource constraints where they are intended to be used.

MOOVE is built with clinicians and health institutions across Sub-Saharan Africa, South Asia, Latin America, and Europe, and focuses on high-stakes decision-making in both resource-constrained and high-resource settings.

RESEARCH QUESTIONS/TASKS

This project is centered on actively building and operating a production-grade, web-based evaluation platform. You will work within a multidisciplinary, international team spanning engineering, clinical practice, and health policy, and you will:

– Develop and extend the MOOVE platform across front-end and back-end components
– Adapt the platform for use in clinical trials and regulated research settings
– Design innovative benchmarking methodologies and evaluation workflows
– Implement participatory evaluation, leaderboards, and incentive mechanisms
– Ensure platform stability, security, and high standards of design and engineering
– Track and incorporate evolving evaluation standards across cultures, languages, and health system contexts
– Participate in validatathons and contribute to interdisciplinary research outputs

SKILLS YOU WILL BUILD / NEED

– Front-end and/or back-end web development skills
– Experience working with TypeScript and modern web frameworks (Next.js / React)
– Experience with Firebase (authentication, backend services, data flow)
– Front-end ↔ back-end integration for data-intensive web applications
– Experience with data analysis, experimentation, or research-oriented software
– Interest in user-centered design for expert and clinical audiences
– Ability to collaborate effectively in multidisciplinary, international teams
– Familiarity with AI systems, evaluation, or clinical research is helpful but not required

RetroMoove

MOOVE, Modular systems

RetroMOOVE is a research project focused on building large-scale evaluation benchmarks for medical AI using retrospectively collected, real-world clinical data. The project uses routinely collected healthcare data from multiple countries, including Tanzania, Rwanda, Kenya, and others, to construct realistic evaluation settings that reflect actual clinical workflows, data quality, and decision-making constraints.

By grounding evaluation in retrospective real-world data, RetroMOOVE enables systematic comparison of large language models against established clinical baselines, moving beyond synthetic prompts or isolated benchmarks.

The project currently includes large-scale retrospective datasets across multiple clinical domains and geographies:

– Pediatric primary health care in Tanzania and Rwanda
– Maternal and child health across Kenya, India, and other Sub-Saharan African settings
– Mental health using retrospective datasets from Kenya and South Africa

RESEARCH QUESTIONS/TASKS

This project combines dataset construction, modeling, and evaluation methodology. You will:

– Curate and structure large-scale retrospective clinical datasets from multiple countries
– Design evaluation benchmarks based on real clinical tasks and outcomes
– Develop deterministic predictive models as baseline comparators
– Implement rule-based clinical algorithms reflecting existing standards of care
– Compare large language models against statistical and rule-based baselines
– Explore hybrid model architectures that combine deterministic models with LLMs to ground reasoning and improve reliability
– Study model behavior, generalization, and failure modes across countries and health system contexts

SKILLS YOU WILL BUILD / NEED

– Experience with Python and data analysis
– Familiarity with machine learning or statistical modeling
– Interest in evaluation, benchmarking, and model comparison
– Experience working with real-world clinical or observational data is helpful
– Ability to reason about modular model architectures and comparative evaluation

Meditron-4: Clinical feedback alignment and SOTA dev

Meditron

Meditron-4 is the next iteration of Meditron, focused on advancing clinically aligned, guideline-faithful, and context-aware medical language modeling.

The project aims to develop open-source fine-tuning and evaluation pipelines for medical LLMs, and to produce a highly clinically aligned Meditron model built on leading open medical and general base models. In parallel, the work explores small, offline-capable models (e.g. MedGemma 4B, LFM-2) to support deployment in resource-constrained settings.

RESEARCH QUESTIONS/TASKS

– Developing open-source fine-tuning pipelines for clinically aligned medical language models
– Designing evaluation pipelines that assess clinical behavior and guideline adherence
– Improving clinical alignment through feedback-driven and data-driven methods
– Producing a strong, reproducible Meditron-4 model based on open foundations
– Exploring small, offline-capable models for practical deployment in low-resource environments

SKILLS YOU WILL BUILD / NEED

– Experience with machine learning or large language models
– Familiarity with model fine-tuning and evaluation workflows
– Proficiency in Python and modern ML frameworks

Meditron reasoning

Meditron

This project focuses on improving clinical reasoning capabilities in Meditron by studying explicit reasoning mechanisms in medical language models. In high-stakes clinical settings, correct answers are not sufficient on their own; models must demonstrate coherent, grounded reasoning that aligns with clinical logic, evidence, and guidelines. This project treats reasoning as a measurable, evaluable property rather than an emergent side effect.

The work explores how reasoning-oriented training methods can improve decision consistency, error detection, and robustness in medical language models.

RESEARCH QUESTIONS/TASKS

This project combines model development with careful evaluation of reasoning behavior. You will:

– Design and integrate reasoning-focused training objectives into Meditron
– Apply unsupervised or self-supervised reinforcement learning approaches
– Study how different training signals affect reasoning consistency and failure modes
– Evaluate reasoning performance on clinically relevant decision-making tasks
– Analyze trade-offs between reasoning depth, accuracy, and computational cost
– Contribute to reproducible training pipelines and evaluation protocols

SKILLS YOU WILL BUILD / NEED

– Strong background in machine learning and deep learning
– Experience training and evaluating large language models
– Familiarity with reinforcement learning or optimization methods is helpful
– Proficiency in Python and modern ML frameworks
– Ability to analyze model behavior beyond surface-level accuracy

Polyglot Meditron

Meditron

Most medical language models are trained and evaluated primarily in English, despite the fact that clinical care is delivered in many languages, often in settings where English is not the working language. In low-resource and multilingual health systems, this mismatch limits the usefulness and safety of AI tools.

PolyglotMeditron focuses on extending Meditron to support multilingual clinical reasoning, with an emphasis on low-resource languages and real clinical usage. The project studies how medical meaning, terminology, and decision logic transfer across languages, and where existing multilingual models fail in medical contexts.

RESEARCH QUESTIONS/TASKS

This project combines model development with targeted evaluation of multilingual behavior. You will:

– Improve Meditron’s performance in non-English languages
– Focus on low-resource and underrepresented languages relevant to clinical practice
– Extend support for both written and spoken language inputs
– Develop and evaluate handling of non-English medical terminology and domain-specific usage
– Analyze the limitations of polyglot base models when applied to medical tasks
– Evaluate multilingual models on clinically grounded tasks across languages

SKILLS YOU WILL BUILD / NEED

– Background in machine learning or natural language processing
– Experience working with language models
– Familiarity with speech processing or multilingual NLP is helpful
– Interest in multilingual modeling in clinical and global health settings
– Proficiency in Python and modern ML tooling

MultiMeditron

Multimodal

MultiMeditron focuses on making Meditron multimodal, allowing users to provide medical images in addition to text. The aim is to extend Meditron beyond text-only inputs while preserving its existing structure and capabilities.

The work is twofold. First, the Meditron codebase is adapted to support a multimodal architecture and new input modalities (currently limited to images). Second, modality-specific “expert” models are developed and improved to process these inputs and generate embeddings that are fed into Meditron.

RESEARCH QUESTIONS/TASKS

– Adapting the Meditron codebase to support a modular, extensible multimodal architecture
– Integrating new modalities into Meditron beyond text, starting with medical images
– Designing, training, and improving modality-specific expert models
– Developing embedding strategies for fusing multimodal representations within Meditron
– Studying how multimodal inputs affect clinical reasoning, performance, and robustness
– Evaluating multimodal Meditron models on clinically relevant tasks

SKILLS YOU WILL BUILD / NEED

– Machine learning and deep learning fundamentals
– Experience with neural network architectures, particularly vision or multimodal models
– Proficiency in Python and common ML frameworks

Foundation Models for Voice-Based Clinical Assessment

Multimodal

Voice carries clinically meaningful information relevant to respiratory disease, mental health, neurological conditions, and general clinical assessment. While large speech foundation models show strong general capabilities, their reliability, robustness, and data requirements in real clinical settings remain poorly characterized. This project focuses on rigorously evaluating speech foundation models for clinical voice-based assessment, with particular attention to performance under realistic, resource-constrained conditions.

The goal is to determine when such models are useful, where they fail, and what constraints must be addressed before they can be safely integrated into health systems.

RESEARCH QUESTIONS/TASKS

This project is centered on systematic benchmarking and evaluation. You will:

– Benchmark pretrained speech foundation models on clinically relevant voice tasks
– Evaluate model performance across multiple datasets, populations, and recording conditions
– Analyze robustness to noise, speaker variability, language, and device quality
– Study data efficiency and performance under limited labeled data
– Compare models using reproducible evaluation protocols
– Document failure modes, limitations, and deployment-relevant trade-offs

SKILLS YOU WILL BUILD / NEED

– Machine learning and deep learning fundamentals
– Proficiency in Python (PyTorch preferred)
– Experience with experimental design and model evaluation
– Familiarity with speech or audio processing is helpful but not required
– Ability to work with heterogeneous datasets and analyze model behavior critically

Training a Medical Vision Transformer

Multimodal

This project focuses on training a Vision Transformer (ViT) for medical imaging using unsupervised or self-supervised learning objectives (e.g. DINO-style methods) on large-scale medical image datasets.

The work emphasizes reproducing a state-of-the-art approach, building a clean and extensible training codebase, and running large-scale training experiments suitable for downstream medical imaging tasks.

RESEARCH QUESTIONS/TASKS

– Training a Vision Transformer using self-supervised or unsupervised objectives
– Reproducing and validating a state-of-the-art method on medical imaging data
– Designing a clean, modular, and extensible training codebase
– Executing and monitoring large-scale training runs
– Evaluating learned representations on downstream medical imaging tasks

SKILLS YOU WILL BUILD / NEED

– Strong Python and PyTorch experience
– Familiarity with modern training frameworks (e.g. PyTorch Lightning or similar)
– Experience with configuration management tools (e.g. Hydra)
– Experience working with large datasets and data pipelines (e.g. WebDataset)
– Comfort running and debugging large-scale training jobs

Implementable Multimodal AI for Tuberculosis and Pediatric Pneumonia

Multimodal

Tuberculosis and pediatric pneumonia remain major causes of morbidity and mortality in high-burden, resource-constrained settings. This project focuses on building multimodal AI systems that are explicitly designed to operate under real-world clinical, infrastructural, and governance constraints, rather than idealized research conditions.

The work integrates lung ultrasound video, structured clinical data, and metadata, and emphasizes robustness to variation across operators, sites, and patient populations. All models are evaluated with deployment in mind, including explicit analysis under domain shift and contribution to ongoing prospective clinical trials.

RESEARCH QUESTIONS/TASKS

This project combines multimodal modeling with real clinical evaluation. You will:

– Build multimodal AI systems for tuberculosis and pediatric pneumonia
– Integrate lung ultrasound video, structured clinical data, and metadata
– Apply spatiotemporal (3D) modeling approaches for ultrasound data
– Implement federated and privacy-preserving learning methods
– Evaluate model performance and robustness under domain shift across operators and sites
– Contribute to dataset curation, quality assurance, and experiment orchestration
– Support analyses used in ongoing prospective clinical trials

SKILLS YOU WILL BUILD / NEED

– Strong Python programming skills
– Familiarity with machine learning and deep learning fundamentals
– Experience with PyTorch and computer vision
– Comfort working in Linux environments and shared compute infrastructure
– Ability to work with real-world clinical data and high-stakes evaluation settings

Quantisation of Medical LLMs

Optimization

This project studies model quantisation for medical large language models in practice, with an emphasis on careful, reproducible experimentation. Quantisation is a key enabler for deploying medical LLMs in resource-constrained settings, where memory, compute, power, and latency limitations are critical. Understanding how quantisation affects clinical model behavior is essential before such models can be used safely in real-world health systems.

RESEARCH QUESTIONS/TASKS

This project focuses on systematic experimentation and evaluation of quantised medical LLMs. You will:

– Apply and compare commonly used quantisation methods on medical language models
– Evaluate the impact of quantisation on accuracy, stability, and clinical behavior
– Study trade-offs between model size, inference speed, memory usage, and performance
– Design reproducible experiments and benchmarking pipelines
– Document results in a clear, structured manner, including benchmarks and analysis artifacts
– Contribute insights that inform deployment of efficient models in resource-constrained settings

SKILLS YOU WILL BUILD / NEED

– Python programming for running and modifying ML experiments
– Experience running machine learning models via scripts or the command line
– Understanding of model efficiency, compression, and deployment trade-offs
– Familiarity with large language models is helpful but not required
– Strong attention to experimental rigor and reproducibility

Edge-Deployed Multimodal RAG and Small-LLM Systems

Optimization

This project focuses on designing and evaluating edge-deployed AI systems for maternal and child health decision support in Zanzibar. In this setting, community health workers operate with limited connectivity, constrained hardware, and high responsibility for frontline clinical decisions. Cloud-dependent AI systems are often impractical due to latency, cost, privacy, and reliability constraints.

The project studies how small language models, retrieval-augmented generation, and multimodal inputs can be combined into on-device systems that support context-aware reasoning using locally available clinical guidelines, patient records, documents, images, and speech.

RESEARCH QUESTIONS/TASKS

This is a systems-focused research and engineering project grounded in a real deployment context. You will:

– Design on-device AI pipelines that run under strict compute, memory, and latency constraints
– Build and evaluate small-LLM-based RAG systems using locally stored clinical guidelines and reference materials
– Integrate multiple input modalities relevant to community health work, including text, documents, images, and speech
– Study trade-offs between model size, accuracy, responsiveness, and robustness in field conditions
– Evaluate system behavior in workflows used by community health workers
– Contribute to iterative system refinement based on deployment feedback and field constraints

SKILLS YOU WILL BUILD / NEED

– Strong Python programming skills
– Familiarity with machine learning fundamentals
– Interest in NLP, multimodal learning, or AI systems
– Experience with PyTorch, HuggingFace, or audio/image processing is helpful
– Comfort working in Linux environments and debugging resource-constrained systems
– Willingness to engage with real-world clinical workflows and constraints

Cross-Disease Missing-Modality Multimodal Learning

Multimodal

In real-world healthcare settings, multimodal data are frequently incomplete. Imaging, laboratory results, clinical notes, or sensor data may be missing due to cost, infrastructure, workflow, or patient factors. Models that assume complete modality availability often fail when deployed. This project addresses that gap by studying how to build and evaluate multimodal models that remain valid when data modalities are partially or entirely missing.

The focus is on robustness and generalization across diseases and datasets, reflecting the conditions under which clinical AI systems are actually used.

RESEARCH QUESTIONS/TASKS

This project focuses on model design and evaluation under missing-modality conditions. You will:

– Design multimodal architectures that can operate when one or more input modalities are unavailable
– Implement and compare strategies for handling missing data at training and inference time
– Evaluate model performance across multiple healthcare datasets and disease areas
– Analyze robustness under different patterns of modality absence
– Study generalization across datasets, diseases, and modality configurations
– Produce reproducible experimental results and clear evaluation analyses

SKILLS YOU WILL BUILD / NEED

– Solid background in machine learning and deep learning
– Experience with Python and at least one deep learning framework (preferably PyTorch)
– Familiarity with multimodal learning or data fusion concepts is helpful
– Ability to design careful experiments and analyze model behavior under failure conditions

Distillation of Medical LLMs

Optimization

This project studies knowledge distillation for medical large language models, with the goal of producing smaller, more efficient models that retain clinically relevant behavior. Distillation is a key technique for deploying medical LLMs in settings with limited compute, memory, or connectivity, but its effects on clinical performance and failure modes are not well understood. This project treats distillation as a scientific problem rather than a purely engineering shortcut.

RESEARCH QUESTIONS/TASKS

The project focuses on systematic experimentation and analysis of distillation strategies for medical LLMs. You will:

– Compare different knowledge distillation approaches for medical language models
– Study how teacher model choice influences student behavior and performance
– Analyze the impact of training data selection on distilled model outcomes
– Experiment with alternative student model architectures
– Evaluate trade-offs between model size, accuracy, stability, and computational cost
– Design reproducible experiments and document results clearly and rigorously

SKILLS YOU WILL BUILD / NEED

– Solid Python programming skills
– Experience training and evaluating neural networks
– Basic understanding of large language models and optimization
– Familiarity with knowledge distillation or model compression concepts is helpful
– Strong attention to experimental design and reproducibility

Current Topics in AI for Health: A Literature Review Project

Literature

This project is centered on in-depth literature review in artificial intelligence for healthcare. Participants select a focused topic aligned with the interests of the lab and conduct a structured, critical review of the relevant research literature.

The primary outcome is a well-written literature review on a defined topic, alongside regular contributions to the collective knowledge of the group. Topics may range from theoretical foundations of machine learning and modeling choices to applied questions around evaluation, deployment, safety, and impact of AI in health.

RESEARCH QUESTIONS/TASKS

You will conduct a structured, critical review of a focused topic in AI for health. Specifically, you will:

– Define a precise review question aligned with active LiGHT projects
– Read and critically assess foundational and recent research papers
– Present and discuss one paper regularly with the group, focusing on methods, assumptions, and limitations
– Identify trends, gaps, and unresolved research questions
– Write a structured, well-argued literature review as the primary project output
– Produce written syntheses that support and guide downstream research work

SKILLS YOU WILL BUILD / NEED

– Critical reading and evaluation of scientific literature
– Scientific writing and structured argumentation
– Oral scientific communication
– Ability to assess evidence, methods, and claims rigorously
– No specific technical prerequisites; intellectual rigor and consistency are expected

LiGHT Bootcamp

Pedagogy

The LiGHT Bootcamp is an online, certificate-based MOOC web platform for hands-on training in interdisciplinary AI skills applied to healthcare. It is designed for ministries of health, policymakers, and clinicians working in low-resource settings.

The platform combines structured online content with mentored, practice-oriented learning that is continuously updated to reflect advances in AI and healthcare. Participants engage with applied tools, real-world use cases, and contemporary methods, with an emphasis on interactive, responsive, and engaging learning.

The bootcamp is developed and delivered in collaboration with ministries of health and clinical stakeholders, with the goal of providing sustained access to up-to-date knowledge, practical tools, and best practices for responsible AI use in health systems.
Students may work on this project either as a side project (1 or 2 hours per week of mentoring) or as a full time semester/optional project (there are many ways you can help improve the MOOC, check the "Research questions/tasks" below)

RESEARCH QUESTIONS/TASKS

– Improving MOOC content to enhance learning outcomes and clarity
– Designing and implementing pedagogical features for interactive, hands-on learning on a web-based platform
– Mentoring and guiding participants throughout completion of the bootcamp
– Conducting research on how learners engage with AI-for-health content, informed by educational science
– Developing and refining certificate-based evaluation and assessment materials
– Ensuring course content remains current with evolving AI and healthcare innovations

SKILLS YOU WILL BUILD / NEED

– Web development and UI/UX design in Streamlit
– Machine learning or data science fundamentals
– Experience with online learning platforms or educational tooling

MMORE/MIRAGE-Software Engineering for Data Handling

Software engineering, Data science

MMORE is LiGHT’s core research library for multimodal document intelligence. It provides the infrastructure required to process, index, retrieve, and extract structured information from complex documents that combine text, layout, tables, figures, and images. Such documents are ubiquitous in clinical and regulatory settings, yet remain difficult to handle reliably at scale. LiGHT started developing a new library, MIRAGE, that makes it way easier to process datasets either with code or with an LLM. There is no longer a need to write a processing script for each dataset, you can simply write a prompt for an LLM or a code snippet that will be applied to each sample of the dataset, MIRAGE handles the rest, including parallelization.

These libraries underpin key LiGHT research infrastructures, including MOOVE and Meditron, and are used in settings where robustness, traceability, and scalability are essential. The project aims at maintaining and developing Python libraries that are useful for data science work in the lab (and for others), allowing to focus the work on data and evaluation.

RESEARCH QUESTIONS/TASKS

This project combines research-grade software engineering with methodological work on multimodal systems. You will:

– Maintain and extend a production-quality Python library used across multiple LiGHT projects
– Improve robustness, performance, and reliability based on real user and system feedback
– Implement new features for multimodal document processing and retrieval
– Study system-level trade-offs between accuracy, cost, latency, and robustness in large-scale pipelines
– Bring your ideas of projects or features!

SKILLS YOU WILL BUILD / NEED

– Strong Python programming skills
– Familiarity with modern machine learning tooling and workflows
– Interest in multimodal learning, information retrieval, or ML systems
– Experience with large-scale ML systems or document processing is helpful but not required
– Ability to work on shared research infrastructure used by multiple downstream projects