Pan-genome Analysis, Visualization and Exploration

DSpace Repository


Dokumentart: PhDThesis
Date: 2018
Language: English
Faculty: 7 Mathematisch-Naturwissenschaftliche Fakultät
Department: Biologie
Advisor: Neher, Richard (Dr.)
Day of Oral Examination: 2018-01-29
DDC Classifikation: 570 - Life sciences; biology
Keywords: Bioinformatik , Datenanalyse , Visualisierung
Other Keywords:
Pan-genome analysis
Pan-genome visualization
Order a printed copy: Print-on-Demand
Show full item record


The dynamics of prokaryotic genomes are driven by the intricate interplay of different evolutionary forces such as gene duplication, gene loss and horizontal transfer. Even closely related strains can exhibit remarkable genetic diversity and substantial gene presence/absence variation. The pan-genome, namely the complete inventory of genes in a collection of strains, can be several times larger than the genome of any single strain. Although several tools for pan-genome analysis have been published, there is still much room for algorithmic improvement, as well as needs for applications that better interactively visualize and explore pan-genomes. Therefore, we have developed panX, an automated computational pipeline for efficient identification of orthologous gene clusters in the pan-genome. PanX identifies homologous relationships among genes using DIAMOND and MCL and then harnesses phylogeny-based post- processing to separate orthologs from paralogs. Furthermore, we take advantage of a divide-and-conquer strategy to achieve an approximately linear runtime on large datasets. The analysis result can be visualized by the accompanying software, an easy-to-use and powerful web-based visualization application for interactive exploration of the pan-genome. The visualization dashboard encompasses a variety of connected components that allow rapid searching, filtering and sorting of genes and flexible investigation of evolutionary relationships among strains and their genes. PanX seamlessly interlinks gene clusters with their alignments and gene phylogenies, maps mutations on the branches of gene tree and highlights gene gain and loss events on the core-genome phylogeny that can also be colored by metadata associated with strains. By using 120 simulated pan-genome datasets for benchmarking and comparing clustering results on real dataset between different tools, panX exhibits overall good performance across a large range of diversities. PanX is available at, with a wide range of microbial pan-genomes established. Besides, user-provided pan-genomes can be visualized either via a web server or by running panX locally as a web-based application.

This item appears in the following Collection(s)