An Interview with Professor Michael J. D. Powell    

 

                                                                       ------Interviewed by Xiaoling Sun via Email

 

 

 

Introduction 

 

Next  September,  a conference will be organized in Beijing by Prof. Ya-xiang Yuan  to celebrate Prof. M.J. D Powell's 70th birthday. 

 

M.J.D. Powell completed his undergraduate studies at the University of Cambridge in 1959.  From 1959 to 1976 he worked at the Atomic

Energy Establishment, Harwell, where he was  Head of the Numerical Analysis Group from 1970. He has been John Humphrey Plummer Professor of Applied Numerical Analysis, University of Cambridge since 1976 and a Fellow  of Pembroke College, Cambridge since 1978.


He made many seminal contributions in approximation theory, nonlinear optimization, and other topics in numerical analysis. He has written a book in approximation theory and more than one hundred and fifty papers.

 

He was awarded the George B. Dantzig Prize in 1982 for his contributions

to mathematical programming.  He is a Fellow  of  The Royal Society of

London and  the US National Academy of Sciences.

 

 

 

 

 

 

1. Your full name, address and e-mail address:


    Michael James David Powell

     Centre for Mathematical Sciences,
     Wilberforce Road,
     Cambridge CB3 0WA, UK.

     E-mail: mjdp@cam.ac.uk

2. Highest degree
     Doctor of Science, University of Cambridge (1979)


3. Number of publications

    There are 171 entries in my current list of publications, which include papers in conference proceedings, because

they compare favourably in quality and originality with the contributions to journals. About 86 of these works are

in the field of optimization, and about 57 of them are on approximation, including one book, namely Approximation

Theory and Methods (Cambridge University Press, 1981). Most of the other articles address numerical linear

algebra and some applications.

4. Research interests
 

    My research on optimization began in 1960 when I was employed at the Atomic Energy Research Establishment,

Harwell to help scientists with their numerical calculations. I was expected to find suitable algorithms and to provide

computer programs for generating results, but there was no need for the work to lead to the publication of papers.

It was relatively easy then to develop new algorithms that have practical advantages over current methods, and doing

so is still my main research interest. I have also been challenged by many theoretical questions that are relevant to

practical computation. After about 3 years at Harwell, I began to publish frequently, a selection of my papers being

given below.


5. Selected papers on optimization and remarks

[1]  "A rapidly convergent descent method for optimization" (with   R. Fletcher), Computer Journal, Vol. 6,

pp. 163-168 (1963).

 

Remarks:    When I experimented with the algorithm of W.C. Davidon, described in the report "Variable metric

 method for minimization" (No. 5990 Rev., Argonne National Laboratory, 1959), I was able to minimize some

functions of 100 variables to high accuracy, but the practical limit then for other methods was about 20 variables.

The same  discovery was made independently by Roger Fletcher. We met in time to write this paper together on

the major breakthrough that was provided by Davidon's brilliant technique for updating estimates of positive

definite second derivative matrices.

[2]  "An efficient method for finding the minimum of a function of    several variables without calculating derivatives",

Computer    Journal, Vol. 7, pp. 155-162 (1964).


Remarks:  By employing search directions that are conjugate to each other, one can minimize a convex quadratic function of n variables in at most n steps, and conjugacy can be achieved by searches along parallel lines in the space of the variables without the

calculation of any derivatives. The paper describes an algorithm that takes advantage of these remarks automatically

if the objective function happens to be quadratic, although the objective function is allowed to be general. There

has not only been much interest in the given way of constructing conjugate directions, but also I still receive

occasional questions from users about the Fortran implementation of the method that was written more than

40 years ago.

[3]  "A method for nonlinear constraints in minimization problems",   in Optimization, editor R. Fletcher,

Academic Press (London),     pp. 283-297 (1969).


Remarks: In a penalty function method for minimizing a function subject to  equality constraints, an algorithm

for unconstrained minimization is applied to the sum of the objective function and large multiples of the squares

of the constraint functions. A generalization of this technique combines smaller multipliers with constant shifts

of the constraint functions, the shifted functions being squared as before. The paper considers the adjustment

of the multipliers and shifts in an outer iteration, and usually only the shifts are altered. Thus the procedure
becomes equivalent to the updating of Lagrange multiplier estimates in the well-known augmented Lagrangian

method for constrained optimization. Therefore I have enjoyed much credit for being an originator of that
important method.


[4]  "Some global convergence properties of a variable metric algorithm  for minimization without exact line searches",

 in Nonlinear    Programming, SIAM-AMS Proceedings, Vol. IX, editors R.W. Cottle  and C.E. Lemke,

American Mathematical Society (Providence),     pp. 53-72 (1976).
 

 Remarks:  I have always believed that theoretical analysis is not essential to the development and publication of

new algorithms, because the DFP method in [1] above is highly useful, although many questions about its
convergence are open at present. If the objective function is convex and continuously differentiable with bounded

level sets, then the DFP method with exact line searches provides convergence in theory to the optimal value of the

objective function, but early termination of the line searches is important to efficiency. Attempts to analyse this

situation are incomplete, although the work in [4] provides a favourable answer for inexact line searches when the

DFP updating formula is replaced by the BFGS formula.


[5]  "A fast algorithm for nonlinearly constrained optimization  calculations", in Numerical Analysis Dundee 1977,

Lecture Notes    in Mathematics No. 630, editor G.A. Watson, Springer-Verlag      (Berlin), pp. 144-157 (1978).
 

 Remarks: The method in [3] includes an outer iteration for the adjustment of estimates of Lagrange multipliers,

but there is only one iterative procedure in SQP (sequential quadratic programming). Here, for each new

choice of the vector of variables, quadratic and linear approximations are made to the objective and constraint

functions, respectively, which gives a quadratic programming problem that has Lagrange multipliers. Furthermore,

constraint curvature can be treated adequately by taking the second derivative matrix of the quadratic approximation

from an estimate of the  Lagrange function instead of from the original objective function. One of the first SQP

algorithms of this kind is described in [5] with some numerical results. A version of the BFGS formula is included,
in order that the user does not have to calculate any second derivatives.


[6]  "The watchdog technique for forcing convergence in algorithms for   constrained optimization" (with

R.M. Chamberlain, C. Lemarechal     and H.C. Pedersen), Math. Programming Studies, No. 16, pp. 1-17
  (1982).


Remarks:  When developing minimization algorithms, I assume that good estimates of the required values of the

variables may not be available initially. Hence, in constrained optimization, the question arises whether to  accept
a change to the variables that reduces one but not both of the objective function and constraint violations. It can also

happen that good changes to the variables make both of these terms worse, which is known as the "Maratos effect".

Then, if the change is accepted, its success may become obvious on the next iteration. Therefore the purpose of the

watchdog technique is to allow some new vectors of variables to be starting points of iterations, even if they do not

satisfy the tests of the algorithm that force convergence, but the tests may reject such trial variables later, with

back-tracking to the iteration that generated them.
 

[7]  "A tolerant algorithm for linearly constrained optimization  calculations", Math. Programming B, Vol. 45,

pp. 547-566 (1989).
 

Remarks:    If an algorithm for linearly constrained optimization employs line searches and preserves feasibility,

then the step along each search direction is restricted by the boundaries of the constraints. Thus some algorithms

become inefficient, taking small steps on many iterations, especially when the number of constraints is large.

The new feature of the algorithm of [7] avoids this behaviour by choosing directions that stay away from constraint

boundaries that are close to the starting point of each iteration. Specifically, if an inequality constraint holds strictly

at the starting point, and if the constraint residual is no greater than the value of a tolerance parameter, then the

constraint residual is assumed to be zero. It follows that each line search is restricted only by the boundaries of

constraints whose residuals exceed the tolerance at the starting point. The given algorithm includes a procedure

that decreases the tolerance parameter automatically, which is essential to the final accuracy. Some numerical

results illustrate the advantages of the method.
 

[8]  "A direct search optimization method that models the objective    and constraint functions by linear interpolation",

 in Advances in    Optimization and Numerical Analysis, editors S. Gomez and J-P.   Hennart, Kluwer Academic

(Dordrecht), pp. 51-67 (1994).

 

 Remarks:  In general an optimization algorithm requires more information in order to achieve faster convergence,

but ease of use may be more important than efficiency, especially when the number of variables is small.  The algorithm in [8] minimizes a function subject to nonlinear constraints without requiring any derivatives, which is highly convenient in many
applications, but often the rate of convergence of the iterations is slow. Indeed, instead of using quadratic models,

the approximations to the objective and constraint functions are only linear, being defined by interpolation at n+1

points, where n is the number of variables. Thus the difficulty arises of avoiding a degenerate interpolation problem

when the positions of the points are chosen and adjusted automatically. For example, the method would collapse if

all the points were on the boundary of a linear constraint. The given procedure seems to be suitable, even when the

initial vector of variables is far from the required solution.

[9]  "UOBYQA: unconstrained optimization by quadratic approximation",  Math. Programming B, Vol. 92,

pp. 555-582 (2002).
 

Remarks: If all the parameters of a quadratic model are defined by interpolation to function values, then

 (n+1)(n+2)/2 interpolation conditions are needed, where n is still the number of variables. The purpose of

the UOBYQA software in [9] is to investigate the performance of this approach to unconstrained
minimization without derivatives, a trust region method being used to derive a new vector of variables from the

current quadratic model. The interpolation points are kept apart by a lower bound on each trust region radius,

which is reduced only when necessary for further progress. Thus, unlike methods that estimate derivatives by

differences, UOBYQA is suitable for noisy objective functions. There are some special iterations that avoid

degeneracies in the positions of the interpolation points. Each quadratic model is updated by replacing only

one of its interpolation points, so only one new function value is calculated on each iteration. Thus usually the

total number of evaluations of the objective function is tolerable. On the other hand, the amount of routine work

of each iteration is proportional to the fourth power of n, which imposes a severe restriction on the number of

variables.

 

[10] "The NEWUOA software for unconstrained optimization without  derivatives", in Large-Scale Nonlinear

Optimization, editors G.  Di Pillo and M. Roma, Springer (New York), pp. 255-297 (2006).

Remarks: Experiments with the BFGS algorithm show that, when n is large,  there is no need for accurate second

derivatives in the quadratic models of algorithms for unconstrained optimization. Therefore far fewer than
(n+1)(n+2)/2 interpolation conditions may provide adequate quadratic models  when first derivatives are not

available. In the algorithm of [10], namely  NEWUOA, each model interpolates m values of the objective function,

the choice m=2n+1 being recommended. Then the initial model is defined uniquely by letting its second derivative

matrix be diagonal, each iteration replaces only one of the interpolation points, and the freedom in the updating

of each model is taken up by minimizing the Frobenius norm of the change to the second derivative matrix.

The performance of this algorithm compared with UOBYQA is stunning when the number of variables is large.

Indeed, the total number of function values to achieve high accuracy is only of magnitude n in some experiments, and,

when m=2n+1, the routine work of each iteration is proportional only to the square of n. Thus it becomes possible

to minimize functions of hundreds of variables without the calculation or numerical estimation of first derivatives.


6. The general aim of my research


     At the beginning of my career, optimization algorithms were developed in order to solve the numerical calculations

of computer users. Further, as suggested in the remarks on [4] above, it was usual to publish algorithms without

any theoretical analysis, but with some numerical results. The nonlinear part of mathematical programming became

an academic subject that was taught and studied at universities later, which provided respectability  and coherence

from a theoretical point of view. Thus there were major changes in the objectives of research. I even know of a

case in rational approximation where two versions of an algorithm were constructed, but  the original paper on

the work recommends the version that had a proof of convergence, although the other version is much faster

in practice. There are also cases in optimization where algorithms are modified not to improve their performance

but to assist convergence theory. Another way of enabling analysis is to make assumptions, even if they are not

satisfied in many useful applications of the algorithm. Indeed, the theoretical side of our subject has been given much

importance academically, so now it is not unusual for journals to publish papers on new algorithms that have not
been tried in practice.

     On the other hand, empirical methods, such as simulated annealing and genetic algorithms, receive much attention

in engineering and scientific publications. Their names suggest that they work in ways that are analogous to reputable

models of real situations, which makes them attractive to computer users. Thus some understanding of their techniques

can be presented simply, in contrast to the papers on advances in mathematical programming that are written by

theoreticians for other theoreticians to read. Furthermore, the inclusion of empirical methods in numerical calculations

is usually easy, and often their inefficiencies are tolerable, because computers are very fast and low accuracy may be

sufficient. It seems, therefore, that huge improvements are possible by replacing the procedures that are actually

used to solve many nonlinear optimization problems.

     I wish to encourage the choice of efficient procedures instead of  empirical methods. Therefore the principal aim

of my current research is to provide algorithms with good convergence properties that are easy to use. In particular,

my papers [8]-[10] propose algorithms that are without the calculation of derivatives. Please send me an e-mail

(mjdp@cam.ac.uk) if you would like to receive, free of charge, the Fortran software of any or all of these methods,

and Fortran is also available for the algorithm of paper [7], namely TOLMIN. I have plans for extending the

NEWUOA algorithm of [10] to constrained calculations, beginning with simple bounds on the variables.


7.  Career highlights

     I was very fortunate to have had the opportunity to develop some of the early versions of algorithms for nonlinear

optimization that are studied and used today. Thus many researchers in our field are interested in my work, a highlight

being the award of the George B. Dantzig Prize in 1982 for my contributions to mathematical programming. Other

honours include election to The Royal Society of London and to the US National Academy of Sciences. A particular

pleasure on my last visit to China was meeting my great grandson, where I am referring to generations of research
students.

 

(Related link: Another interview article (by Luis Nunes Vicente) of Prof. M.J.D. Powell  can be found in http://www.mat.uc.pt/~lnv/papers/mjdp.pdf)