The book gives an introduction to the
principles of program analysis. It concentrates on the four main approaches (data
flow analysis, control flow analysis, abstract interpretation, and type and
effect systems) indicating their relationships and differences; this in itself
spans a variety of programming language features. Our coverage of algorithmic
techniques will be substantial, presenting a uniform approach to the fast
algorithms for equation and constraint solving.
The level of exposition assumes some
background in programming languages (including semantics and compilers). The
presentation is aimed at graduate students but keeping in mind that it should
also be useful for the beginning postgraduate student and that the elementary
parts should be readable by advanced undergraduate students. Additionally,
researchers and practitioners in the field may find the book of interest for
its broad coverage of all of the main approaches, highlighting the many
similarities that exist between them.
The book contains six chapters, three
appendices, indices of terminology and of notation, and a bibliography. Each
chapter contains a number of exercises and the book contains several
mini-projects allowing the student to explore parts of the development in
greater detail.
The aim of this chapter
is to illustrate the fundamental nature of the four approaches by presenting
illuminating examples of analyses for imperative as well as functional
programs. This will not only highlight some of the issues to be dealt with in
more detail later but will also establish the frame of reference for subsequent
discussions; in particular, it will allow the reader to appreciate the many similarities between the four
approaches (and in contrast with the impression one might get from reading the
literature). Next we present the first very simple algorithm for calculating the
results of the analyses; it can be seen as the basis of several of the
algorithms to be developed in subsequent chapters. Finally, we illustrate how
to use program analysis information for program transformations.
We begin with a
treatment of the classical intra-procedural analyses of live variables,
available expressions, reaching definitions, very busy expressions, ud-chaining
and du-chaining and thus explain the distinction between forward and backward analyses. Next we provide a
structural operational semantics for the imperative language and prove the
semantic correctness of of the live variables analysis; a mini-project shows
how to prove the semantic correctness of the reaching definitions analysis. We
then present the generalisation offered by the monotone frameworks of Kam and
Ullman; we formulate constant propagation analysis as an example of an analysis
that is not also in the distributive framework. We then present a general
worklist algorithm for computing the MFP solution of monotone frameworks and we
discuss that the MFP solution may differ from the MOP solution.
We then move on to some more complex
aspects of data flow analysis. We deal at some length with the various
techniques for interprocedural analysis covering such techniques as call
strings (a la Sharir and Pnueli) and assumption sets (a la Landi and Ryder).
The final section develops an analysis that determines whether or not the graph
pointed to specialises to being non-cyclic or even a tree, and it ends by
presenting a more ambitious shape analysis.
We develop a control
flow analysis (also known as a closure analysis) for a simple untyped
functional language; this can be regarded as an outgrowth of the interprocedural
data flow analysis in that now we tackle the problem of ``dynamic dispatch''.
It takes the form of an abstract coinductive specification of what it means for
the results of an 0-CFA (or monovariant) analysis to
be acceptable. Next we provide a structural operational semantics for the
language and prove that the specification is semantically sound; we also show
that the specification always admits a best solution by establishing a general
result about the set of solutions being a
We complete the chapter by dealing with
some more complex aspects. One is the addition of a data flow component (e.g.
in the form of a monotone framework) thereby showing how to incorporate the
ideas from intraprocedural analysis. Another is the incorporation of context in
the manner of call strings (a variant of k-CFA
called uniform-k-CFA)
thereby establishing the link to interprocedural analysis.
To illustrate the
general nature of abstract interpretation our coverage will not be focused on
any particular language or semantics. To this end we abstractly formulate
correctness as the preservation of a certain correctness relation during
computation. We then deal with the important techniques of widening and narrowing for ``speeding-up'' the
computation of the least fixed point (actually ``over-shooting'' it). Next we
deal with the classical technique of Galois
connections (or adjunctions) for replacing computations over one
universe with computations over a coarser universe. We go into the details of
how to construct Galois connections in a systematic manner, covering such
techniques as independent attribute methods versus relational methods, direct
products versus tensor products, and the reduced versions of these
constructions. Finally, we show how to ``induce'' the actual computations to be
carried out over the coarser domain; this includes a treatment of how to
``induce'' widenings and we show how to apply these ideas in the case of the
monotone frameworks of Chapter 2. Through a number of examples we show how to
develop Webber's approximation for checking indices into arrays (presented at
PLDI'97).
The chapter begins by
formulating a type and effect system for a variant of the control flow analysis
considered in Chapter 3; this also serves to introduce sub-effecting. Next we
prove that the information specified is semantically correct with respect to a
natural semantics (or big-step structural operational semantics); in the manner
of Chapter 3 we also show that there always is a best analysis. Then we show
how to efficiently implement the analysis by generating constraints; this
involves proving the syntactic soundness and completeness of a version of
algorithm W (used for type
inference in the lambda-calculus).
To illustrate the variety of type and
effect systems we conclude by formulating several other type and effect systems
for the simple functional language. First we show how to specify a side effect
analysis and this involves a discussion of sub-typing. Then we formulate an
escape analysis and this involves a discussion of polymorphism. Next we cover
region analysis and briefly discuss polymorphic recursion. Finally we develop a
simple pointer analysis that is much coarser (and much faster) than those
considered in Chapter 2.
We begin by considering
the solution of simple constraint systems (either equations, as in data flow
analysis, or simple inequations). We present a generalised version of the
worklist algorithm used in Chapters 2 and 3. We show that for some analyses a
round-robin algorithm can be more efficient. For each algorithm, we present
correctness results and discuss complexity.
This appendix presents
the key definitions and results for partially ordered sets that will be used in
the book. This includes Tarski's result saying (among other things) that a
monotone function on a complete lattice has a least as well as a greatest fixed
point.
This appendix surveys
the key induction principles to be used in the book. It also introduces the
proof principle of coinduction (based on Tarski's result) that is used for the
development of the control flow analysis in Chapter 3.
This appendix presents
those concepts for graphs and regular expressions that form the basis for the
algorithms that are presented in Chapter 6. This includes such concepts as
depth first spanning trees, reverse postorder and reducibility by intervals.