Establishing graph based pan-genomics in Arabidopsis thaliana

DSpace Repositorium (Manakin basiert)

Zur Kurzanzeige

dc.contributor.advisor Weigel, Detlef (Prof. Dr.)
dc.contributor.author Kubica, Christian
dc.date.accessioned 2024-12-05T08:15:03Z
dc.date.available 2024-12-05T08:15:03Z
dc.date.issued 2024-12-05
dc.identifier.uri http://hdl.handle.net/10900/159342
dc.identifier.uri http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-1593421 de_DE
dc.description.abstract By definition, single reference genomes cannot reflect genetic diversity. The representa- tion of the genetic potential of a whole species as a single linear string of characters and all analyses based on them are inherently biased. This reference bias has been acknowl- edged for a long time, but only recently have we been able to address it. The advent of long-read sequencing and many additional genome assemblies for the same species has allowed us to obtain a better understanding of variation in genome content within a species. In addition, the availability of these new data types have made the implementa- tion of a long standing concept feasible: the genome graph. This data structure combines multiple reference genomes into a single representation that is able to reflect more of the sequence space than a linear reference genome. In this thesis I present six highly contiguous de-novo assembled genomes of Arabidop- sis thaliana that are annotated using the new pan-genome aware auto-ant annotation pipeline. These assemblies are used to construct a complex, whole-genome alignment derived genome graph. I will show that building such a graph is not only theoretically possible, but also practically feasible, representing the full pan-genome of the input gen- ome assemblies. I can access this graph-based pan-genome using the novel reference free variant detection algorithm panSV. I can also show that short-read alignments to the genome graph are possible and suffer from a reduced reference bias, due to the expanded reference structure. Variant calls based on the graph have a reduced heterozygosity noise that will aid future discoveries. The use of genome graphs greatly increases our understanding of a species pan-genome and allows us to combine the power of multiple assembled genomes. Although the method is in need of further development and improvements, I have made a first case for the use of highly complex graphs in plant species. en
dc.language.iso en de_DE
dc.publisher Universität Tübingen de_DE
dc.rights ubt-podno de_DE
dc.rights.uri http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de de_DE
dc.rights.uri http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en en
dc.subject.classification Bioinformatik , Schmalwand <Arabidopsis> , Ackerschmalwand , Genom de_DE
dc.subject.ddc 004 de_DE
dc.subject.ddc 570 de_DE
dc.subject.other pan-genome en
dc.subject.other reference bias en
dc.title Establishing graph based pan-genomics in Arabidopsis thaliana en
dc.type PhDThesis de_DE
dcterms.dateAccepted 2024-11-12
utue.publikation.fachbereich Informatik de_DE
utue.publikation.fakultaet 7 Mathematisch-Naturwissenschaftliche Fakultät de_DE
utue.publikation.noppn yes de_DE

Dateien:

Das Dokument erscheint in:

Zur Kurzanzeige