dc.contributor.advisor |
Weigel, Detlef (Prof. Dr.) |
|
dc.contributor.author |
Kubica, Christian |
|
dc.date.accessioned |
2024-12-05T08:15:03Z |
|
dc.date.available |
2024-12-05T08:15:03Z |
|
dc.date.issued |
2024-12-05 |
|
dc.identifier.uri |
http://hdl.handle.net/10900/159342 |
|
dc.identifier.uri |
http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-1593421 |
de_DE |
dc.description.abstract |
By definition, single reference genomes cannot reflect genetic diversity. The representa- tion of the genetic potential of a whole species as a single linear string of characters and all analyses based on them are inherently biased. This reference bias has been acknowl- edged for a long time, but only recently have we been able to address it. The advent of long-read sequencing and many additional genome assemblies for the same species has allowed us to obtain a better understanding of variation in genome content within a species. In addition, the availability of these new data types have made the implementa- tion of a long standing concept feasible: the genome graph. This data structure combines multiple reference genomes into a single representation that is able to reflect more of the sequence space than a linear reference genome.
In this thesis I present six highly contiguous de-novo assembled genomes of Arabidop- sis thaliana that are annotated using the new pan-genome aware auto-ant annotation pipeline. These assemblies are used to construct a complex, whole-genome alignment derived genome graph. I will show that building such a graph is not only theoretically possible, but also practically feasible, representing the full pan-genome of the input gen- ome assemblies. I can access this graph-based pan-genome using the novel reference free variant detection algorithm panSV. I can also show that short-read alignments to the genome graph are possible and suffer from a reduced reference bias, due to the expanded reference structure. Variant calls based on the graph have a reduced heterozygosity noise that will aid future discoveries.
The use of genome graphs greatly increases our understanding of a species pan-genome and allows us to combine the power of multiple assembled genomes. Although the method is in need of further development and improvements, I have made a first case for the use of highly complex graphs in plant species. |
en |
dc.language.iso |
en |
de_DE |
dc.publisher |
Universität Tübingen |
de_DE |
dc.rights |
ubt-podno |
de_DE |
dc.rights.uri |
http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de |
de_DE |
dc.rights.uri |
http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en |
en |
dc.subject.classification |
Bioinformatik , Schmalwand <Arabidopsis> , Ackerschmalwand , Genom |
de_DE |
dc.subject.ddc |
004 |
de_DE |
dc.subject.ddc |
570 |
de_DE |
dc.subject.other |
pan-genome |
en |
dc.subject.other |
reference bias |
en |
dc.title |
Establishing graph based pan-genomics in Arabidopsis thaliana |
en |
dc.type |
PhDThesis |
de_DE |
dcterms.dateAccepted |
2024-11-12 |
|
utue.publikation.fachbereich |
Informatik |
de_DE |
utue.publikation.fakultaet |
7 Mathematisch-Naturwissenschaftliche Fakultät |
de_DE |
utue.publikation.noppn |
yes |
de_DE |