Direct search methods for optimization without gradients
Let’s face it, calculating gradients to feed to optimization algorithms can be a bit of a pain. Depending on what functions you are calling and what language you are using, you may be able to use Automatic Differentiation. However, this isn’t always possible.
For a long time, I assumed that the right way to do local search in this case is either to spend the time implementing the analytical derivative, or using a numerical approximation, along with an optimization algorithm that utilizes that derivative calculation (such as quasi newton or conjugate gradient).
However, recently, I’ve noticed in some experiments that truely derivative free methods, like Nelder-Mead and Powell’s Direction Set Method, often take far fewer evaluations to achieve the same result. Inspired by these observations, I did some research, and found that I’m a few years behind… In 1995 Margeret Wright produced the paper "Direct Search Methods: Once Scorned, Now Respectable", which describes why these useful algorithms have been largely ignored by the academic community for many years.
It turns out that Michael Powell, who wrote one of the most important optimization algorithms over 40 years ago, is still researching this area, and discusses the history and current issues in "Direct Search Algorithms for Optimization Calculations". His group at Cambridge have been developing both local and global general-purpose algorithms, which look very promising. A key area of recent research is local search methods which are tolerant of noise – these are useful for stochastic optimization, online updates (e.g. in multi-layer perceptrons), and many "real world" optimization problems where the function surface is concave, except for small "bumps". An example is the freely-available DFO algorithm.