Repeated Measures Latent Dirichlet Allocation for Longitudinal Microbiome Analysis
Document Type
Article
Publication Date
2-21-2025
Original Citation
Pais NV,
Ravishanker N,
Rajasekaran S,
Weinstock GM.
Repeated Measures Latent Dirichlet Allocation for Longitudinal Microbiome Analysis Norman: Springer Nature Switzerland; 2025.
JAX Source
Norman: Springer Nature Switzerland; 2025.
ISBN
978-3-031-82768-6
DOI
https://doi.org/10.1007/978-3-031-82768-6_15
Abstract
Topic modeling algorithms generally examine a set of documents, referred to as a corpus in Natural Language Processing (NLP), and analyze the words observed in a document to uncover themes that run through each document in a collection. In the microbiome framework, they are used to identify co-occurring microbial species and reveal hidden patterns or relationships within the microbial communities. Longitudinal microbiome data analysis provides a robust framework for studying microbiome compositions over time. By collecting multiple samples from the same individuals at different time points, researchers can capture the temporal variation within an individual’s microbiome and evaluate its impact on the subjects’ health status during each of their visits. This paper extends the Latent Dirichlet Allocation (LDA) modeling technique to a repeated measures framework. We propose Repeated Measures Latent Dirichlet Allocation (RM-LDA) where each document (subject) is assumed to be a collection of multiple sub-documents (visits associated with a given subject). In this study, we examine microbiome data on subjects making multiple visits to a medical facility to provide data on their microbiome counts. Our model allows us to analyze hidden patterns in the microbiome data over multiple visits, estimate the latent topic correlation structure within each subject, and study their association with the individual’s health status over each visit.