GeneticAlgoithm
Implementation of the genetic algorithm
|
This project is a first of my new one-day-build mini-projects serie. It is my take on implementing a simple version of the Genetic Algorithm using C++, with a demo use case consisting of fitting a gaussian distribution to a data histogram.
For a description of the code and compiling/running instructions, see Genetic Algorithm README.
The genetic algorithm consist of finding the best set of paramters for a given model using the following concepts (shamelessly stolen from the work of our mother nature over the last several billions of years):
The genetic algorithm repeats the selection, cross-over and mutations steps until the best fitted individual reach a certain fitness threshold.
The demo case generates pseudo-random data following a gaussian distribution and uses the genetic algorithm to guess the parameters of the gaussian from the data. The results are then compared with a likelihood fitting method as implemented in the ROOT library.
To make this implementation reusable in future projects, abstract interfaces are used for the different elements: generic features are implemented in the interface base classes, while features that are specific to the demo case are implemented in derived classes.
The model is what we are trying to optimize. In general, it is described by a set of properties that can have different values.
IModel
is minimal since the details of the model are use-case specific. The only thing the interface knows about is the fact that a model can be scored.ParametricModel
describes a parametric 1D formula of a gaussian distribution of the form: \[ f(x) = N e^{-\frac{(x-\mu)^2}{\sigma^2}} \]
The implementation using ROOT's powerful TF1 formula class allows our ParametricModel sub-class to describe any formula in any number of dimensions and depending on any number of parameters, not to mention the possibility to draw a graph representing our model without the need for a lot of extra code.Sometimes it's called a fitness function: I prefer the term figure of merit over fitness function, because the latter suggest a function that increases with goodness (i.e. the higher the better) while there is no reason for this to always be the case. In fact, in our demo case example, a better \(\chi^2\) is a smaller one.
IFigureOfMerit
provides placeholders for the following functionalities:Chi2FitFigureOfMerit
uses a \(\chi^2\) over number of degrees of freedom estimator to compare a model with a given data set. A data set consists of \(N\) \((\vec{x_i}, y_i)\) pairs with an associated error \(\sigma_{y_i}\) on \(y_i\). The figure of merit is then given by: \[ \chi^2/nfd = \frac{1}{N}\sum_{i=0}^{N}\frac{(y_i - f(\vec{x_i}))^2}{\sigma_{y_i}^2} \]
The population defines a generation of models and controlls the way models are selected, crossed-over and mutated to build the next generation.
IPopulation
already implements a large portion of the functionalities using the interfaces provided for the figure of merit and the model. This include tasks such as ranking the population and finding the best fitted individuals, selection based on this ranking, and handling the mutation rate.ParametricModelPopulation
implements the details of initialization, cross-over and mutation:![]() |
The GeneticAlgorithm
class wraps everything above and controls the flow of the algorithm using the defined interfaces. It performs the following steps:
An example output of the demo looks like this:
![]() |