Abstract:
Recent years have seen a dramatic increase in the amount of genetic information stored in electronic format. It has been estimated that the amount of information in genomics and proteomics doubles every 20 months and the size and number of databases are increasing even faster. It is widely accepted that a sophisticated exploration of such data is crucial in a variety of fields such as disease genetics and pharmacogenomics. While both corporate and institutional efforts have concentrated on the integration of heterogeneous data in genomics and proteomics, a systematic data exploration is still at its beginning. Although data mining has celebrated many successes in business operations applications as retail and marketing, its application to scientific and engineering data is not straightforward.
Data sets in life sciences are often significantly larger in volume, structurally more complex then traditional business data, and often rapidly changing in time.
In contrast to business environments, the body of existing background knowledge in life sciences is extensive.
We will report on recent efforts to adapt data mining technology for effective knowledge discovery in life sciences. As one example, we will describe how DNA chip technology, data mining and molecular genetic expert knowledge can be combined for effective discovery of complex functional relationships between genotypic information on one hand and clinical parameters on the other hand.