GenePrep

Unified Genomic Data Formatter for Statistical Analysis

By Yue Xu1, Sirihaasa Nallamothu2, Haoyang Liu2, Haohan Wang21Columbia University, 2UIUC

Overview

GenePrep is an automated multi‑agent tool to streamline preprocessing and analysis of large‑scale gene expression data (GEO, TCGA). Provide a dataset and trait–condition pairs; GenePrep validates data, selects pairs, performs statistical tests, and outputs reproducible logs and CSV results with minimal scripting.

Quick Start

CLI Preview

$ conda create -n agent python=3.10
$ pip install -r requirements.txt
$ python main.py --version 1 \
    --model gemini-2.0-flash-002 --api 1

Commands shown for illustration; see manual for full options.

Features

Automation

End‑to‑end workflow: validation, pair selection, tests, and result generation.

Modular Agents

Plan, execute, and debug with minimal manual scripting using modular agents.

Reproducibility

Terminal‑style logs and CSV outputs for transparent, repeatable results.

BibTeX

BibTex Code Here