![](images/EscUpmPolit_p.gif "UPM")

# Course Notes for Learning Intelligent Systems

Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias

## [Introduction to Machine Learning III](4_0_0_Intro_ML_3.ipynb)

# Table of Contents

* [Introduction](#Introduction)
* [Genetic Algorithms](#Genetic-Algorithms)
* [Reading Data from a File](#Reading-Data-from-a-File)
* [Exercises](#Exercises)
* [Optional exercises](#Optional-exercises)

# Introduction
The purpose of this practice is to understand better how GAs work. 

There are many libraries that implement GAs; you can find some of them in the [References](#References) section.

# Genetic Algorithms
In this section, we are going to use the library [DEAP](https://github.com/DEAP/deap/tree/master) for implementing a genetic algorithms.

We are going to implement the OneMax problem as seen in class.

First, follow the DEAP package instructions and install DEAP.

Then, follow the following notebook [OneMax](https://github.com/DEAP/notebooks/blob/master/OneMax.ipynb) to understand how DEAP works and solves this problem. Observe that it is requested to register types and functions in the DEAP framework. Observe also how you can execute genetic operators such as mutate.

We have included a simple code that solves the OneMax problem in the following cell (taken from [DEAP](http://deap.readthedocs.io/en/master/examples/ga_onemax.html) and added a line to show the best individual in each generation).

Read  tutorial from [DEAP](http://deap.readthedocs.io/en/master/examples/ga_onemax.html) to understand the code.

In [None]:
import random

from deap import base
from deap import creator
from deap import tools

creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)

toolbox = base.Toolbox()
# Attribute generator 
toolbox.register("attr_bool", random.randint, 0, 1)
# Structure initializers
toolbox.register("individual", tools.initRepeat, creator.Individual, 
    toolbox.attr_bool, 100)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

def evalOneMax(individual):
    return sum(individual),

toolbox.register("evaluate", evalOneMax)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutFlipBit, indpb=0.05)
toolbox.register("select", tools.selTournament, tournsize=3)


def main():
    pop = toolbox.population(n=300)
    CXPB, MUTPB, NGEN = 0.5, 0.2, 40
        
    # Evaluate the entire population
    fitnesses = list(map(toolbox.evaluate, pop))
    for ind, fit in zip(pop, fitnesses):
        ind.fitness.values = fit
    # Extracting all the fitnesses of 
    fits = [ind.fitness.values[0] for ind in pop]
    
    # Variable keeping track of the number of generations     
    g = 0
    
    # Begin the evolution
    while max(fits) < 100 and g < 1000:
        # A new generation
        g = g + 1
        print("-- Generation %i --" % g)
        # Select the next generation individuals
        offspring = toolbox.select(pop, len(pop))
        # Clone the selected individuals
        offspring = list(map(toolbox.clone, offspring))
        # Apply crossover and mutation on the offspring
        for child1, child2 in zip(offspring[::2], offspring[1::2]):
            if random.random() < CXPB:
                toolbox.mate(child1, child2)
                del child1.fitness.values
                del child2.fitness.values

        for mutant in offspring:
            if random.random() < MUTPB:
                toolbox.mutate(mutant)
                del mutant.fitness.values
        # Evaluate the individuals with an invalid fitness
        invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
        fitnesses = map(toolbox.evaluate, invalid_ind)
        for ind, fit in zip(invalid_ind, fitnesses):
            ind.fitness.values = fit
            
        pop[:] = offspring
        
            # Gather all the fitnesses in one list and print the stats
        fits = [ind.fitness.values[0] for ind in pop]
        
        length = len(pop)
        mean = sum(fits) / length
        sum2 = sum(x*x for x in fits)
        std = abs(sum2 / length - mean**2)**0.5
        
        print("  Min %s" % min(fits))
        print("  Max %s" % max(fits))
        print("  Avg %s" % mean)
        print("  Std %s" % std)
        best_ind = tools.selBest(pop, 1)[0]
        print("Best individual so far is %s, %s" % (best_ind, best_ind.fitness.values))

Run the genetic algorithm and interpret the results.

In [None]:
main()

# Exercises

## Comparing
Your task is to modify the previous code to canonical GA configuration from Holland (look at the lesson's slides). In addition you should consult the [DEAP API](http://deap.readthedocs.io/en/master/api/tools.html#operators).

Submit your notebook and include a modified code and a comparison of the effects of these changes. 

Discuss your findings.

## Optional. Optimizing ML hyperparameters

One of the applications of Genetic Algorithms is the optimization of ML hyperparameters. Previously, we have used GridSearch from Scikit. Using [sklearn-deap](https://github.com/rsteca/sklearn-deap), optimize the Titatic hyperparameters using both GridSearch and Genetic Algorithms. 

The same exercise (using the digits dataset) can be found in this [notebook](https://github.com/rsteca/sklearn-deap/blob/master/test.ipynb).

Since there is a problem with Scikit version 0.24, you can just comment on the different approaches.
Alternatively, you can also use  the library [sklearn-genetic-opt](https://sklearn-genetic-opt.readthedocs.io/en/stable/index.html) and discuss the digit classification example included in the library: [digits decision tree](https://sklearn-genetic-opt.readthedocs.io/en/stable/notebooks/Digits_decision_tree.html).

## Optional. Optimizing an ML pipeline with a genetic algorithm

The library [TPOT](https://epistasislab.github.io/tpot/latest/) optimizes ML pipelines and comes with a lot of [examples](https://epistasislab.github.io/tpot/latest/Tutorial/9_Genetic_Algorithm_Overview/) and even notebooks, for example for the [iris dataset](https://github.com/EpistasisLab/tpot/blob/master/tutorials/IRIS.ipynb).

Your task is to apply TPOT to the intermediate challenge and write a short essay explaining:
* what TPOT does (with your own words).
* how you have experimented with TPOT (what you have tried and how long. Take into account that it should be run from hours to days to get good results. Read the documentation, it is not that long!).
* the results. If TPOT is rather clever or your group got better results.

## References
* [deap](https://github.com/deap/deap)
* [sklearn-deap](https://github.com/rsteca/sklearn-deap)
* [tpot](http://epistasislab.github.io/tpot/)
* [gplearn](http://gplearn.readthedocs.io/en/latest/index.html)
* [scikit-allel](https://scikit-allel.readthedocs.io/en/latest/)
* [sklearn-genetic](https://github.com/manuel-calzolari/sklearn-genetic)
* [sklearn-genetic-opt](https://sklearn-genetic-opt.readthedocs.io/en/stable/)

## Licence

The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  

© Carlos A. Iglesias, Universidad Politécnica de Madrid.