You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
sitc/ml4/2_5_1_Exercise.ipynb

313 lines
11 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Introduction to Machine Learning III](4_0_0_Intro_ML_3.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"\n",
"* [Introduction](#Introduction)\n",
"* [Genetic Algorithms](#Genetic-Algorithms)\n",
"* [Reading Data from a File](#Reading-Data-from-a-File)\n",
"* [Exercises](#Exercises)\n",
"* [Optional exercises](#Optional-exercises)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction\n",
"The purpose of this practice is to understand better how GAs work. \n",
"\n",
"There are many libraries that implement GAs, you can find some of then in the [References](#References) section."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Genetic Algorithms\n",
"In this section we are going to use the library DEAP [[References](#References)] for implementing a genetic algorithms.\n",
"\n",
"We are going to implement the OneMax problem as seen in class.\n",
"\n",
"First, follow the DEAP package instructions and install DEAP.\n",
"\n",
"Then, follow the following notebook [OneMax](https://github.com/DEAP/notebooks/blob/master/OneMax.ipynb) to understand how DEAP works and solves this problem. Observe that it is requested to register types and functions in the DEAP framework. Observe also how you can execute genetic operators such as mutate.\n",
"\n",
"We have included a simple code that solves the OneMax problem in the following cell (taken from [DEAP](http://deap.readthedocs.io/en/master/examples/ga_onemax.html) and added a line to show the best individual in each generation).\n",
"\n",
"Read tutorial from [DEAP](http://deap.readthedocs.io/en/master/examples/ga_onemax.html) to understand the code."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"\n",
"from deap import base\n",
"from deap import creator\n",
"from deap import tools\n",
"\n",
"creator.create(\"FitnessMax\", base.Fitness, weights=(1.0,))\n",
"creator.create(\"Individual\", list, fitness=creator.FitnessMax)\n",
"\n",
"toolbox = base.Toolbox()\n",
"# Attribute generator \n",
"toolbox.register(\"attr_bool\", random.randint, 0, 1)\n",
"# Structure initializers\n",
"toolbox.register(\"individual\", tools.initRepeat, creator.Individual, \n",
" toolbox.attr_bool, 100)\n",
"toolbox.register(\"population\", tools.initRepeat, list, toolbox.individual)\n",
"\n",
"def evalOneMax(individual):\n",
" return sum(individual),\n",
"\n",
"toolbox.register(\"evaluate\", evalOneMax)\n",
"toolbox.register(\"mate\", tools.cxTwoPoint)\n",
"toolbox.register(\"mutate\", tools.mutFlipBit, indpb=0.05)\n",
"toolbox.register(\"select\", tools.selTournament, tournsize=3)\n",
"\n",
"\n",
"def main():\n",
" pop = toolbox.population(n=300)\n",
" CXPB, MUTPB, NGEN = 0.5, 0.2, 40\n",
" \n",
" # Evaluate the entire population\n",
" fitnesses = list(map(toolbox.evaluate, pop))\n",
" for ind, fit in zip(pop, fitnesses):\n",
" ind.fitness.values = fit\n",
" # Extracting all the fitnesses of \n",
" fits = [ind.fitness.values[0] for ind in pop]\n",
" \n",
" # Variable keeping track of the number of generations \n",
" g = 0\n",
" \n",
" # Begin the evolution\n",
" while max(fits) < 100 and g < 1000:\n",
" # A new generation\n",
" g = g + 1\n",
" print(\"-- Generation %i --\" % g)\n",
" # Select the next generation individuals\n",
" offspring = toolbox.select(pop, len(pop))\n",
" # Clone the selected individuals\n",
" offspring = list(map(toolbox.clone, offspring))\n",
" # Apply crossover and mutation on the offspring\n",
" for child1, child2 in zip(offspring[::2], offspring[1::2]):\n",
" if random.random() < CXPB:\n",
" toolbox.mate(child1, child2)\n",
" del child1.fitness.values\n",
" del child2.fitness.values\n",
"\n",
" for mutant in offspring:\n",
" if random.random() < MUTPB:\n",
" toolbox.mutate(mutant)\n",
" del mutant.fitness.values\n",
" # Evaluate the individuals with an invalid fitness\n",
" invalid_ind = [ind for ind in offspring if not ind.fitness.valid]\n",
" fitnesses = map(toolbox.evaluate, invalid_ind)\n",
" for ind, fit in zip(invalid_ind, fitnesses):\n",
" ind.fitness.values = fit\n",
" \n",
" pop[:] = offspring\n",
" \n",
" # Gather all the fitnesses in one list and print the stats\n",
" fits = [ind.fitness.values[0] for ind in pop]\n",
" \n",
" length = len(pop)\n",
" mean = sum(fits) / length\n",
" sum2 = sum(x*x for x in fits)\n",
" std = abs(sum2 / length - mean**2)**0.5\n",
" \n",
" print(\" Min %s\" % min(fits))\n",
" print(\" Max %s\" % max(fits))\n",
" print(\" Avg %s\" % mean)\n",
" print(\" Std %s\" % std)\n",
" best_ind = tools.selBest(pop, 1)[0]\n",
" print(\"Best individual so far is %s, %s\" % (best_ind, best_ind.fitness.values))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run the genetic algorithm and interpret the results."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"main()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comparing\n",
"Your task is to modify the previous code to canonical GA configuration from Holland (look at the lesson's slides). In addition you should consult the [DEAP API](http://deap.readthedocs.io/en/master/api/tools.html#operators).\n",
"\n",
"Submit your notebook and include a modified code and a comparison of the effects of these changes. \n",
"\n",
"Discuss your findings."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Optional. Optimizing ML hyperparameters\n",
"\n",
"One of the applications of Genetic Algorithms is the optimization of ML hyperparameters. Previously we have used GridSearch from Scikit. Using (sklearn-deap)[[References](#References)], optimize the Titatic hyperparameters using both GridSearch and Genetic Algorithms. \n",
"\n",
"The same exercise (using the digits dataset) can be found in this [notebook](https://github.com/rsteca/sklearn-deap/blob/master/test.ipynb).\n",
"\n",
"Submit a notebook where you include well-crafted conclusions about the exercises, discussing the pros and cons of using genetic algorithms for this purpose.\n",
"\n",
"Note: There is a problem with Scikit version 0.24. Comment on the different approaches."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Optional exercises\n",
"\n",
"Here there is a proposed optional exercise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Optional. Optimizing an ML pipeline with a genetic algorithm\n",
"\n",
"The library [TPOT](#References) optimizes ML pipelines and comes with a lot of (examples)[https://epistasislab.github.io/tpot/examples/] and even notebooks, for example for the [iris dataset](https://github.com/EpistasisLab/tpot/blob/master/tutorials/IRIS.ipynb).\n",
"\n",
"Your task is to apply TPOT to the intermediate challenge and write a short essay explaining:\n",
"* what TPOT does (with your own words).\n",
"* how you have experimented with TPOT (what you have tried and how long. Take into account that it should be run from hours to days to get good results. Read the documentation, it is not that long!).\n",
"* the results. If TPOT is rather clever or your group got better results."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"* [deap](https://github.com/deap/deap)\n",
"* [sklearn-deap](https://github.com/rsteca/sklearn-deap)\n",
"* [tpot](http://epistasislab.github.io/tpot/)\n",
"* [gplearn](http://gplearn.readthedocs.io/en/latest/index.html)\n",
"* [scikit-allel](https://scikit-allel.readthedocs.io/en/latest/)\n",
"* [scklearn-genetic](https://github.com/manuel-calzolari/sklearn-genetic)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": false
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}