Document Type

Article

Publication Date

1-1-2025

Publication Title

Front Med (Lausanne)

Keywords

JGM

JAX Source

Front Med (Lausanne). 2025;12:1510431.

Volume

12

First Page

1510431

Last Page

1510431

ISSN

2296-858X

PMID

40144871

DOI

https://doi.org/10.3389/fmed.2025.1510431

Abstract

BACKGROUND: Knowledge-driven prioritization of candidate genes derived from large-scale molecular profiling data for targeted transcriptional profiling assays is challenging due to the vast amount of biomedical literature that needs to be harnessed. We present a workflow leveraging Large Language Models (LLMs) to prioritize candidate genes within module M12.15, a plasma cell-associated module from the BloodGen3 repertoire, by integrating knowledge-driven prioritization with data-driven analysis of transcriptome profiles.

METHODS: The workflow involves a two-step process: (1) high-throughput screening using LLMs to score and rank the 17 genes of module M12.15 based on six predefined criteria, and (2) prioritization employing high-resolution scoring and fact-checking, with human experts validating and refining AI-generated scores.

RESULTS: The first step identified five candidate genes (CD38, TNFRSF17, IGJ, TOP2A, and TYMS). Following human-augmented LLM scoring and fact checking, as part of the second step, CD38 and TNFRSF17 emerged as the top candidates. Next, transcriptome profiling data from three datasets was incorporated in the workflow to assess expression levels and correlations with the module average across various conditions and cell types. It is on this basis that CD38 was prioritized as the top candidate, with TNFRSF17 and IGJ identified as promising alternatives.

CONCLUSION: This study introduces a systematic framework that integrates LLMs with human expertise for gene prioritization. Our analysis identified CD38, TNFRSF17, and IGJ as the top candidates within the plasma cell-associated module M12.15 from the BloodGen3 repertoire, with their relative rankings varying systematically based on specific evaluation criteria, from plasma cell biology to therapeutic relevance. This criterion-dependent ranking demonstrates the ability of the framework to perform nuanced, multi-faceted evaluations. By combining knowledge-driven analysis with data-driven metrics, our approach provides a balanced and comprehensive method for biomarker selection. The methodology established here offers a reproducible and scalable approach that can be applied across diverse biological contexts and extended to analyze large module repertoires.

Comments

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms

Share

COinS