Abstract:
This work is based on the observation that languages for two seemingly distant
domains are closely related. Orthogonal query languages based on comprehension
syntax admit various forms of query nesting to construct nested query results
and express complex predicates. Languages for nested data parallelism allow to
nest parallel iterators and thereby admit the parallel evaluation of
computations that are themselves parallel. Both kinds of languages center around
the application of side-effect-free functions to each element of a collection.
The motivation for this work is the seamless integration of relational database
queries with programming languages. In frameworks for language-integrated
database queries, a host language's native collection-programming API is used
to express queries. To mediate between native collection programming and
relational queries, we define an expressive, orthogonal query calculus that
supports nesting and order. The challenge of query flattening is to translate
this calculus to bundles of efficient relational queries restricted to flat,
unordered multisets. Prior approaches to query flattening either support only
query languages that lack in expressiveness or employ a complex, monolithic
translation that is hard to comprehend and generates inefficient code that is
hard to optimize.
To improve on those approaches, we draw on the similarity to nested data
parallelism. Blelloch's flattening transformation is a static program
transformation that translates nested data parallelism to flat data parallel
programs over flat arrays. Based on the flattening transformation, we describe a
pipeline of small, comprehensible lowering steps that translates our nested
query calculus to a bundle of relational queries. The pipeline is based on a
number of well-defined intermediate languages. Our translation adopts the key
concepts of the flattening transformation but is designed with specifics of
relational query processing in mind.
Based on this translation, we revisit all aspects of query flattening. Our
translation is fully compositional and can translate any term of the input
language. Like prior work, the translation by itself produces inefficient code
due to compositionality that is not fit for execution without optimization. In
contrast to prior work, we show that query optimization is orthogonal to
flattening and can be performed before flattening. We employ well-known work on
logical query optimization for nested query languages and demonstrate that this
body of work integrates well with our approach.
Furthermore, we describe an improved encoding of ordered and nested collections
in terms of flat, unordered multisets. Our approach emits idiomatic relational
queries in which the effort required to maintain the non-relational semantics of
the source language (order and nesting) is minimized.
A set of experiments provides evidence that our approach to query flattening can
handle complex, list-based queries with nested results and nested intermediate
data well. We apply our approach to a number of flat and nested benchmark
queries and compare their runtime with hand-written SQL queries. In these
experiments, our SQL code generated from a list-based nested query language
usually performs as well as hand-written queries.