講演情報
[PE2-7]1細胞トランスクリプトームデータを用いた遺伝子セット間の関係の探索
○Cheng Zheng, 山田 亮, 岡田 大瑚 (京都大学 大学院 医学研究科 附属ゲノム医学センター)
Motivation: In the era of single-cell high-throughput sequencing, it is common to treat gene expression levels as features of cells. As a routine, downstream analysis that includes the gene set enrichment analysis revolves around cells since the primary goal is to find functional gene sets that can account for the compositions of differentially expressed genes. But not unusually, enriched gene sets are abundant and hard to interpret. In this project, we take a top-down approach to analyzing gene sets and propose an ad-hoc clustering-based algorithm to quantify the closeness between two gene sets in the low-dimensional UMAP space. Then we measure the similarity and dissimilarity among gene sets based on either prior knowledge (gene set annotations) or inference from the single-cell RNA-seq data. The goal is to take advantage of the inter-gene-set relationships to build up a structural and concise mapping of relevant gene sets. Results: We preliminarily run our pipeline on single-cell RNA-seq data from human tissues. Initial steps comprise preprocessing of the single-cell RNA-seq data, retrieval of gene sets, and building up a gene set dissimilarity matrix. Dimensionality reduction and visualization show variations in the distributions of gene sets in the UMAP space about varying human tissues. By utilizing the data-driven neighborhood information, We believe that our algorithms can extract and display pertinent layers of gene sets that biologists are interested in, facilitating an understanding of the functions of gene sets.