Establishing graph based pan-genomics in Arabidopsis thaliana

Kubica, Christian

Publikationsdienste
→
TOBIAS-lib - Publikationen und Dissertationen
→
7 Mathematisch-Naturwissenschaftliche Fakultät
→
Dokumentanzeige

« zurück

Establishing graph based pan-genomics in Arabidopsis thaliana

Kubica, Christian

Aufrufstatistik

Dateien:	AthPanGenome.pdf 5.45 MB PDF

Zitierfähiger Link (URI):	http://hdl.handle.net/10900/159342 http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-1593421 http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1593426 http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1593420 http://dx.doi.org/10.15496/publikation-100675
Dokumentart:	Dissertation
Erscheinungsdatum:	2024-12-05
Sprache:	Englisch
Fakultät:	7 Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich:	Informatik
Gutachter:	Weigel, Detlef (Prof. Dr.)
Tag der mündl. Prüfung:	2024-11-12
DDC-Klassifikation:	004 - Informatik 570 - Biowissenschaften, Biologie
Schlagworte:	Bioinformatik , Schmalwand <Arabidopsis> , Ackerschmalwand , Genom
Freie Schlagwörter:	pan-genome reference bias
Lizenz:	http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en
Zur Langanzeige

Abstract:

By definition, single reference genomes cannot reflect genetic diversity. The representa- tion of the genetic potential of a whole species as a single linear string of characters and all analyses based on them are inherently biased. This reference bias has been acknowl- edged for a long time, but only recently have we been able to address it. The advent of long-read sequencing and many additional genome assemblies for the same species has allowed us to obtain a better understanding of variation in genome content within a species. In addition, the availability of these new data types have made the implementa- tion of a long standing concept feasible: the genome graph. This data structure combines multiple reference genomes into a single representation that is able to reflect more of the sequence space than a linear reference genome. In this thesis I present six highly contiguous de-novo assembled genomes of Arabidop- sis thaliana that are annotated using the new pan-genome aware auto-ant annotation pipeline. These assemblies are used to construct a complex, whole-genome alignment derived genome graph. I will show that building such a graph is not only theoretically possible, but also practically feasible, representing the full pan-genome of the input gen- ome assemblies. I can access this graph-based pan-genome using the novel reference free variant detection algorithm panSV. I can also show that short-read alignments to the genome graph are possible and suffer from a reduced reference bias, due to the expanded reference structure. Variant calls based on the graph have a reduced heterozygosity noise that will aid future discoveries. The use of genome graphs greatly increases our understanding of a species pan-genome and allows us to combine the power of multiple assembled genomes. Although the method is in need of further development and improvements, I have made a first case for the use of highly complex graphs in plant species.

Das Dokument erscheint in:

7 Mathematisch-Naturwissenschaftliche Fakultät [4914]

Veröffentlichen

Stöbern

Mathematisch-Naturwissenschaftliche Fakultät

Establishing graph based pan-genomics in Arabidopsis thaliana

DSpace Repositorium (Manakin basiert)

Establishing graph based pan-genomics in Arabidopsis thaliana

Abstract:

Das Dokument erscheint in:

Stöbern

Mathematisch-Naturwissenschaftliche Fakultät