Establishing graph based pan-genomics in Arabidopsis thaliana

DSpace Repositorium (Manakin basiert)


Dateien:

Zitierfähiger Link (URI): http://hdl.handle.net/10900/159342
http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-1593421
Dokumentart: Dissertation
Erscheinungsdatum: 2024-12-05
Sprache: Englisch
Fakultät: 7 Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich: Informatik
Gutachter: Weigel, Detlef (Prof. Dr.)
Tag der mündl. Prüfung: 2024-11-12
DDC-Klassifikation: 004 - Informatik
570 - Biowissenschaften, Biologie
Schlagworte: Bioinformatik , Schmalwand <Arabidopsis> , Ackerschmalwand , Genom
Freie Schlagwörter:
pan-genome
reference bias
Lizenz: http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en
Zur Langanzeige

Abstract:

By definition, single reference genomes cannot reflect genetic diversity. The representa- tion of the genetic potential of a whole species as a single linear string of characters and all analyses based on them are inherently biased. This reference bias has been acknowl- edged for a long time, but only recently have we been able to address it. The advent of long-read sequencing and many additional genome assemblies for the same species has allowed us to obtain a better understanding of variation in genome content within a species. In addition, the availability of these new data types have made the implementa- tion of a long standing concept feasible: the genome graph. This data structure combines multiple reference genomes into a single representation that is able to reflect more of the sequence space than a linear reference genome. In this thesis I present six highly contiguous de-novo assembled genomes of Arabidop- sis thaliana that are annotated using the new pan-genome aware auto-ant annotation pipeline. These assemblies are used to construct a complex, whole-genome alignment derived genome graph. I will show that building such a graph is not only theoretically possible, but also practically feasible, representing the full pan-genome of the input gen- ome assemblies. I can access this graph-based pan-genome using the novel reference free variant detection algorithm panSV. I can also show that short-read alignments to the genome graph are possible and suffer from a reduced reference bias, due to the expanded reference structure. Variant calls based on the graph have a reduced heterozygosity noise that will aid future discoveries. The use of genome graphs greatly increases our understanding of a species pan-genome and allows us to combine the power of multiple assembled genomes. Although the method is in need of further development and improvements, I have made a first case for the use of highly complex graphs in plant species.

Das Dokument erscheint in: