An Interview with Professor Michael J. D. Powell
------Interviewed by Xiaoling Sun via Email
Next September, a conference will be organized in Beijing by Prof. Ya-xiang Yuan to celebrate Prof. M.J. D Powell's 70th birthday.
M.J.D. Powell completed his undergraduate studies at the University of Cambridge in 1959. From 1959 to 1976 he worked at the Atomic
Energy Establishment, Harwell, where he was Head of the Numerical Analysis Group from 1970. He has been John Humphrey Plummer Professor of Applied Numerical Analysis, University of Cambridge since 1976 and a Fellow of Pembroke College, Cambridge since 1978.
He was awarded the George B. Dantzig Prize in 1982 for his contributions
to mathematical programming. He is a Fellow of The Royal Society of
London and the US National Academy of Sciences.
1. Your full name, address and e-mail address:
they compare favourably in quality and originality with the contributions to journals. About 86 of these works are
in the field of optimization, and about 57 of them are on approximation, including one book, namely Approximation
Theory and Methods (Cambridge University Press, 1981). Most of the other articles address numerical linear
algebra and some applications.
My research on optimization began in 1960 when I was employed at the Atomic Energy Research Establishment,
Harwell to help scientists with their numerical calculations. I was expected to find suitable algorithms and to provide
computer programs for generating results, but there was no need for the work to lead to the publication of papers.
It was relatively easy then to develop new algorithms that have practical advantages over current methods, and doing
so is still my main research interest. I have also been challenged by many theoretical questions that are relevant to
practical computation. After about 3 years at Harwell, I began to publish frequently, a selection of my papers being
pp. 163-168 (1963).
Remarks: When I experimented with the algorithm of W.C. Davidon, described in the report "Variable metric
method for minimization" (No. 5990 Rev., Argonne National Laboratory, 1959), I was able to minimize some
functions of 100 variables to high accuracy, but the practical limit then for other methods was about 20 variables.
The same discovery was made independently by Roger Fletcher. We met in time to write this paper together on
the major breakthrough that was provided by Davidon's brilliant technique for updating estimates of positive
definite second derivative matrices.
Computer Journal, Vol. 7, pp. 155-162 (1964).
calculation of any derivatives. The paper describes an algorithm that takes advantage of these remarks automatically
if the objective function happens to be quadratic, although the objective function is allowed to be general. There
has not only been much interest in the given way of constructing conjugate directions, but also I still receive
occasional questions from users about the Fortran implementation of the method that was written more than
40 years ago.
Academic Press (London), pp. 283-297 (1969).
for unconstrained minimization is applied to the sum of the objective function and large multiples of the squares
of the constraint functions. A generalization of this technique combines smaller multipliers with constant shifts
of the constraint functions, the shifted functions being squared as before. The paper considers the adjustment
of the multipliers and shifts in an outer iteration, and usually only
the shifts are altered. Thus the procedure
method for constrained optimization. Therefore I have enjoyed much
credit for being an originator of that
in Nonlinear Programming, SIAM-AMS Proceedings, Vol. IX, editors R.W. Cottle and C.E. Lemke,
American Mathematical Society (Providence), pp. 53-72 (1976).
Remarks: I have always believed that theoretical analysis is not essential to the development and publication of
new algorithms, because the DFP method in  above is highly useful,
although many questions about its
level sets, then the DFP method with exact line searches provides convergence in theory to the optimal value of the
objective function, but early termination of the line searches is important to efficiency. Attempts to analyse this
situation are incomplete, although the work in  provides a favourable answer for inexact line searches when the
DFP updating formula is replaced by the BFGS formula.
Lecture Notes in Mathematics No. 630, editor G.A. Watson, Springer-Verlag
(Berlin), pp. 144-157 (1978).
Remarks: The method in  includes an outer iteration for the adjustment of estimates of Lagrange multipliers,
but there is only one iterative procedure in SQP (sequential quadratic programming). Here, for each new
choice of the vector of variables, quadratic and linear approximations are made to the objective and constraint
functions, respectively, which gives a quadratic programming problem that has Lagrange multipliers. Furthermore,
constraint curvature can be treated adequately by taking the second derivative matrix of the quadratic approximation
from an estimate of the Lagrange function instead of from the original objective function. One of the first SQP
algorithms of this kind is described in  with some numerical results.
A version of the BFGS formula is included,
R.M. Chamberlain, C. Lemarechal and H.C. Pedersen), Math.
Programming Studies, No. 16, pp. 1-17
variables may not be available initially. Hence, in constrained
optimization, the question arises whether to accept
happen that good changes to the variables make both of these terms worse, which is known as the "Maratos effect".
Then, if the change is accepted, its success may become obvious on the next iteration. Therefore the purpose of the
watchdog technique is to allow some new vectors of variables to be starting points of iterations, even if they do not
satisfy the tests of the algorithm that force convergence, but the tests may reject such trial variables later, with
back-tracking to the iteration that generated them.
 "A tolerant algorithm for linearly constrained optimization calculations", Math. Programming B, Vol. 45,
pp. 547-566 (1989).
Remarks: If an algorithm for linearly constrained optimization employs line searches and preserves feasibility,
then the step along each search direction is restricted by the boundaries of the constraints. Thus some algorithms
become inefficient, taking small steps on many iterations, especially when the number of constraints is large.
The new feature of the algorithm of  avoids this behaviour by choosing directions that stay away from constraint
boundaries that are close to the starting point of each iteration. Specifically, if an inequality constraint holds strictly
at the starting point, and if the constraint residual is no greater than the value of a tolerance parameter, then the
constraint residual is assumed to be zero. It follows that each line search is restricted only by the boundaries of
constraints whose residuals exceed the tolerance at the starting point. The given algorithm includes a procedure
that decreases the tolerance parameter automatically, which is essential to the final accuracy. Some numerical
results illustrate the advantages of the method.
 "A direct search optimization method that models the objective and constraint functions by linear interpolation",
in Advances in Optimization and Numerical Analysis, editors S. Gomez and J-P. Hennart, Kluwer Academic
(Dordrecht), pp. 51-67 (1994).
Remarks: In general an optimization algorithm requires more information in order to achieve faster convergence,
but ease of use may be more important than efficiency, especially when
the number of variables is small. The algorithm in  minimizes a
function subject to nonlinear constraints without requiring any
derivatives, which is highly convenient in many
the approximations to the objective and constraint functions are only linear, being defined by interpolation at n+1
points, where n is the number of variables. Thus the difficulty arises of avoiding a degenerate interpolation problem
when the positions of the points are chosen and adjusted automatically. For example, the method would collapse if
all the points were on the boundary of a linear constraint. The given procedure seems to be suitable, even when the
initial vector of variables is far from the required solution.
pp. 555-582 (2002).
Remarks: If all the parameters of a quadratic model are defined by interpolation to function values, then
(n+1)(n+2)/2 interpolation conditions are needed, where n is still the number of variables. The purpose of
the UOBYQA software in  is to investigate the performance of this
approach to unconstrained
current quadratic model. The interpolation points are kept apart by a lower bound on each trust region radius,
which is reduced only when necessary for further progress. Thus, unlike methods that estimate derivatives by
differences, UOBYQA is suitable for noisy objective functions. There are some special iterations that avoid
degeneracies in the positions of the interpolation points. Each quadratic model is updated by replacing only
one of its interpolation points, so only one new function value is calculated on each iteration. Thus usually the
total number of evaluations of the objective function is tolerable. On the other hand, the amount of routine work
of each iteration is proportional to the fourth power of n, which imposes a severe restriction on the number of
 "The NEWUOA software for unconstrained optimization without derivatives", in Large-Scale Nonlinear
Optimization, editors G. Di Pillo and M. Roma, Springer (New York), pp.
derivatives in the quadratic models of algorithms for unconstrained
optimization. Therefore far fewer than
available. In the algorithm of , namely NEWUOA, each model interpolates m values of the objective function,
the choice m=2n+1 being recommended. Then the initial model is defined uniquely by letting its second derivative
matrix be diagonal, each iteration replaces only one of the interpolation points, and the freedom in the updating
of each model is taken up by minimizing the Frobenius norm of the change to the second derivative matrix.
The performance of this algorithm compared with UOBYQA is stunning when the number of variables is large.
Indeed, the total number of function values to achieve high accuracy is only of magnitude n in some experiments, and,
when m=2n+1, the routine work of each iteration is proportional only to the square of n. Thus it becomes possible
to minimize functions of hundreds of variables without the calculation
or numerical estimation of first derivatives.
of computer users. Further, as suggested in the remarks on  above, it was usual to publish algorithms without
any theoretical analysis, but with some numerical results. The nonlinear part of mathematical programming became
an academic subject that was taught and studied at universities later, which provided respectability and coherence
from a theoretical point of view. Thus there were major changes in the objectives of research. I even know of a
case in rational approximation where two versions of an algorithm were constructed, but the original paper on
the work recommends the version that had a proof of convergence, although the other version is much faster
in practice. There are also cases in optimization where algorithms are modified not to improve their performance
but to assist convergence theory. Another way of enabling analysis is to make assumptions, even if they are not
satisfied in many useful applications of the algorithm. Indeed, the theoretical side of our subject has been given much
importance academically, so now it is not unusual for journals to
publish papers on new algorithms that have not
in engineering and scientific publications. Their names suggest that they work in ways that are analogous to reputable
models of real situations, which makes them attractive to computer users. Thus some understanding of their techniques
can be presented simply, in contrast to the papers on advances in mathematical programming that are written by
theoreticians for other theoreticians to read. Furthermore, the inclusion of empirical methods in numerical calculations
is usually easy, and often their inefficiencies are tolerable, because computers are very fast and low accuracy may be
sufficient. It seems, therefore, that huge improvements are possible by replacing the procedures that are actually
used to solve many nonlinear optimization problems.
of my current research is to provide algorithms with good convergence properties that are easy to use. In particular,
my papers - propose algorithms that are without the calculation of derivatives. Please send me an e-mail
(firstname.lastname@example.org) if you would like to receive, free of charge, the Fortran software of any or all of these methods,
and Fortran is also available for the algorithm of paper , namely TOLMIN. I have plans for extending the
NEWUOA algorithm of  to constrained calculations, beginning with
simple bounds on the variables.
optimization that are studied and used today. Thus many researchers in our field are interested in my work, a highlight
being the award of the George B. Dantzig Prize in 1982 for my contributions to mathematical programming. Other
honours include election to The Royal Society of London and to the US National Academy of Sciences. A particular
pleasure on my last visit to China was meeting my great grandson, where
I am referring to generations of research
(Related link: Another interview article (by Luis Nunes Vicente) of Prof. M.J.D. Powell can be found in http://www.mat.uc.pt/~lnv/papers/mjdp.pdf)