An Interview with Professor Michael J. D. Powell
Interviewed by Xiaoling Sun via Email
Introduction
Next September, a conference will be organized in Beijing by Prof. Yaxiang Yuan to celebrate Prof. M.J. D Powell's 70th birthday.
M.J.D. Powell completed his undergraduate studies at the University of Cambridge in 1959. From 1959 to 1976 he worked at the Atomic Energy Establishment, Harwell, where he was Head of the Numerical Analysis Group from 1970. He has been John Humphrey Plummer Professor of Applied Numerical Analysis, University of Cambridge since 1976 and a Fellow of Pembroke College, Cambridge since 1978.
He was awarded the George B. Dantzig Prize in 1982 for his contributions to mathematical programming. He is a Fellow of The Royal Society of London and the US National Academy of Sciences.

1. Your full name, address and email address:
they compare favourably in quality and originality with the contributions to journals. About 86 of these works are in the field of optimization, and about 57 of them are on approximation, including one book, namely Approximation Theory and Methods (Cambridge University Press, 1981). Most of the other articles address numerical linear
algebra and some applications. My research on optimization began in 1960 when I was employed at the Atomic Energy Research Establishment, Harwell to help scientists with their numerical calculations. I was expected to find suitable algorithms and to provide computer programs for generating results, but there was no need for the work to lead to the publication of papers. It was relatively easy then to develop new algorithms that have practical advantages over current methods, and doing so is still my main research interest. I have also been challenged by many theoretical questions that are relevant to practical computation. After about 3 years at Harwell, I began to publish frequently, a selection of my papers being
given below. pp. 163168 (1963).
Remarks: When I experimented with the algorithm of W.C. Davidon, described in the report "Variable metric method for minimization" (No. 5990 Rev., Argonne National Laboratory, 1959), I was able to minimize some functions of 100 variables to high accuracy, but the practical limit then for other methods was about 20 variables. The same discovery was made independently by Roger Fletcher. We met in time to write this paper together on the major breakthrough that was provided by Davidon's brilliant technique for updating estimates of positive
definite second derivative matrices. Computer Journal, Vol. 7, pp. 155162 (1964).
calculation of any derivatives. The paper describes an algorithm that takes advantage of these remarks automatically if the objective function happens to be quadratic, although the objective function is allowed to be general. There has not only been much interest in the given way of constructing conjugate directions, but also I still receive occasional questions from users about the Fortran implementation of the method that was written more than
40 years ago. Academic Press (London), pp. 283297 (1969).
for unconstrained minimization is applied to the sum of the objective function and large multiples of the squares of the constraint functions. A generalization of this technique combines smaller multipliers with constant shifts of the constraint functions, the shifted functions being squared as before. The paper considers the adjustment
of the multipliers and shifts in an outer iteration, and usually only
the shifts are altered. Thus the procedure
method for constrained optimization. Therefore I have enjoyed much
credit for being an originator of that
in Nonlinear Programming, SIAMAMS Proceedings, Vol. IX, editors R.W. Cottle and C.E. Lemke,
American Mathematical Society (Providence), pp. 5372 (1976). Remarks: I have always believed that theoretical analysis is not essential to the development and publication of
new algorithms, because the DFP method in [1] above is highly useful,
although many questions about its level sets, then the DFP method with exact line searches provides convergence in theory to the optimal value of the objective function, but early termination of the line searches is important to efficiency. Attempts to analyse this situation are incomplete, although the work in [4] provides a favourable answer for inexact line searches when the DFP updating formula is replaced by the BFGS formula.
Lecture Notes in Mathematics No. 630, editor G.A. Watson, SpringerVerlag
(Berlin), pp. 144157 (1978). Remarks: The method in [3] includes an outer iteration for the adjustment of estimates of Lagrange multipliers, but there is only one iterative procedure in SQP (sequential quadratic programming). Here, for each new choice of the vector of variables, quadratic and linear approximations are made to the objective and constraint functions, respectively, which gives a quadratic programming problem that has Lagrange multipliers. Furthermore, constraint curvature can be treated adequately by taking the second derivative matrix of the quadratic approximation from an estimate of the Lagrange function instead of from the original objective function. One of the first SQP
algorithms of this kind is described in [5] with some numerical results.
A version of the BFGS formula is included,
R.M. Chamberlain, C. Lemarechal and H.C. Pedersen), Math.
Programming Studies, No. 16, pp. 117
variables may not be available initially. Hence, in constrained
optimization, the question arises whether to accept happen that good changes to the variables make both of these terms worse, which is known as the "Maratos effect". Then, if the change is accepted, its success may become obvious on the next iteration. Therefore the purpose of the watchdog technique is to allow some new vectors of variables to be starting points of iterations, even if they do not satisfy the tests of the algorithm that force convergence, but the tests may reject such trial variables later, with
backtracking to the iteration that generated them. [7] "A tolerant algorithm for linearly constrained optimization calculations", Math. Programming B, Vol. 45,
pp. 547566 (1989). Remarks: If an algorithm for linearly constrained optimization employs line searches and preserves feasibility, then the step along each search direction is restricted by the boundaries of the constraints. Thus some algorithms become inefficient, taking small steps on many iterations, especially when the number of constraints is large. The new feature of the algorithm of [7] avoids this behaviour by choosing directions that stay away from constraint boundaries that are close to the starting point of each iteration. Specifically, if an inequality constraint holds strictly at the starting point, and if the constraint residual is no greater than the value of a tolerance parameter, then the constraint residual is assumed to be zero. It follows that each line search is restricted only by the boundaries of constraints whose residuals exceed the tolerance at the starting point. The given algorithm includes a procedure that decreases the tolerance parameter automatically, which is essential to the final accuracy. Some numerical
results illustrate the advantages of the method. [8] "A direct search optimization method that models the objective and constraint functions by linear interpolation", in Advances in Optimization and Numerical Analysis, editors S. Gomez and JP. Hennart, Kluwer Academic (Dordrecht), pp. 5167 (1994).
Remarks: In general an optimization algorithm requires more information in order to achieve faster convergence,
but ease of use may be more important than efficiency, especially when
the number of variables is small. The algorithm in [8] minimizes a
function subject to nonlinear constraints without requiring any
derivatives, which is highly convenient in many the approximations to the objective and constraint functions are only linear, being defined by interpolation at n+1 points, where n is the number of variables. Thus the difficulty arises of avoiding a degenerate interpolation problem when the positions of the points are chosen and adjusted automatically. For example, the method would collapse if all the points were on the boundary of a linear constraint. The given procedure seems to be suitable, even when the
initial vector of variables is far from the required solution.
pp. 555582 (2002). Remarks: If all the parameters of a quadratic model are defined by interpolation to function values, then (n+1)(n+2)/2 interpolation conditions are needed, where n is still the number of variables. The purpose of
the UOBYQA software in [9] is to investigate the performance of this
approach to unconstrained current quadratic model. The interpolation points are kept apart by a lower bound on each trust region radius, which is reduced only when necessary for further progress. Thus, unlike methods that estimate derivatives by differences, UOBYQA is suitable for noisy objective functions. There are some special iterations that avoid degeneracies in the positions of the interpolation points. Each quadratic model is updated by replacing only one of its interpolation points, so only one new function value is calculated on each iteration. Thus usually the total number of evaluations of the objective function is tolerable. On the other hand, the amount of routine work of each iteration is proportional to the fourth power of n, which imposes a severe restriction on the number of variables.
[10] "The NEWUOA software for unconstrained optimization without derivatives", in LargeScale Nonlinear
Optimization, editors G. Di Pillo and M. Roma, Springer (New York), pp.
255297 (2006).
derivatives in the quadratic models of algorithms for unconstrained
optimization. Therefore far fewer than available. In the algorithm of [10], namely NEWUOA, each model interpolates m values of the objective function, the choice m=2n+1 being recommended. Then the initial model is defined uniquely by letting its second derivative matrix be diagonal, each iteration replaces only one of the interpolation points, and the freedom in the updating of each model is taken up by minimizing the Frobenius norm of the change to the second derivative matrix. The performance of this algorithm compared with UOBYQA is stunning when the number of variables is large. Indeed, the total number of function values to achieve high accuracy is only of magnitude n in some experiments, and, when m=2n+1, the routine work of each iteration is proportional only to the square of n. Thus it becomes possible
to minimize functions of hundreds of variables without the calculation
or numerical estimation of first derivatives.
of computer users. Further, as suggested in the remarks on [4] above, it was usual to publish algorithms without any theoretical analysis, but with some numerical results. The nonlinear part of mathematical programming became an academic subject that was taught and studied at universities later, which provided respectability and coherence from a theoretical point of view. Thus there were major changes in the objectives of research. I even know of a case in rational approximation where two versions of an algorithm were constructed, but the original paper on the work recommends the version that had a proof of convergence, although the other version is much faster in practice. There are also cases in optimization where algorithms are modified not to improve their performance but to assist convergence theory. Another way of enabling analysis is to make assumptions, even if they are not satisfied in many useful applications of the algorithm. Indeed, the theoretical side of our subject has been given much
importance academically, so now it is not unusual for journals to
publish papers on new algorithms that have not in engineering and scientific publications. Their names suggest that they work in ways that are analogous to reputable models of real situations, which makes them attractive to computer users. Thus some understanding of their techniques can be presented simply, in contrast to the papers on advances in mathematical programming that are written by theoreticians for other theoreticians to read. Furthermore, the inclusion of empirical methods in numerical calculations is usually easy, and often their inefficiencies are tolerable, because computers are very fast and low accuracy may be sufficient. It seems, therefore, that huge improvements are possible by replacing the procedures that are actually
used to solve many nonlinear optimization problems. of my current research is to provide algorithms with good convergence properties that are easy to use. In particular, my papers [8][10] propose algorithms that are without the calculation of derivatives. Please send me an email (mjdp@cam.ac.uk) if you would like to receive, free of charge, the Fortran software of any or all of these methods, and Fortran is also available for the algorithm of paper [7], namely TOLMIN. I have plans for extending the
NEWUOA algorithm of [10] to constrained calculations, beginning with
simple bounds on the variables. optimization that are studied and used today. Thus many researchers in our field are interested in my work, a highlight being the award of the George B. Dantzig Prize in 1982 for my contributions to mathematical programming. Other honours include election to The Royal Society of London and to the US National Academy of Sciences. A particular
pleasure on my last visit to China was meeting my great grandson, where
I am referring to generations of research
(Related link: Another interview article (by Luis Nunes Vicente) of Prof. M.J.D. Powell can be found in http://www.mat.uc.pt/~lnv/papers/mjdp.pdf)
