From 1163af348f2820a24f896f009fbc849cc24f82dc Mon Sep 17 00:00:00 2001 From: "Carlos A. Iglesias" Date: Thu, 26 Apr 2018 12:50:14 +0200 Subject: [PATCH] Added ml4 --- ml4/2_5_0_Intro_AG.ipynb | 103 ++++++++++++ ml4/2_5_1_Exercise.ipynb | 303 +++++++++++++++++++++++++++++++++++ ml4/images/EscUpmPolit_p.gif | Bin 0 -> 3171 bytes 3 files changed, 406 insertions(+) create mode 100644 ml4/2_5_0_Intro_AG.ipynb create mode 100644 ml4/2_5_1_Exercise.ipynb create mode 100644 ml4/images/EscUpmPolit_p.gif diff --git a/ml4/2_5_0_Intro_AG.ipynb b/ml4/2_5_0_Intro_AG.ipynb new file mode 100644 index 0000000..208e98d --- /dev/null +++ b/ml4/2_5_0_Intro_AG.ipynb @@ -0,0 +1,103 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![](images/EscUpmPolit_p.gif \"UPM\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Course Notes for Learning Intelligent Systems" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2018 Carlos A. Iglesias" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Introduction to Machine Learning IV\n", + " \n", + "In this lab session, we are going to get more insight about evolutionary computing and, in particular, how genetic algorithms can be used for a number of ML tasks.\n", + "\n", + "# Objectives\n", + "\n", + "The main objectives of this session are:\n", + "* Understand better genetic algorithms\n", + "* Understand how genetic algorithms can be used for ML purposes\n", + "* Experiment with genetic algorithms" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Table of Contents" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. [Exercise](2_5_1_Exercise.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Licence\n", + "The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n", + "\n", + "© 2018 Carlos A. Iglesias, Universidad Politécnica de Madrid." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.5" + }, + "latex_envs": { + "LaTeX_envs_menu_present": true, + "autocomplete": true, + "bibliofile": "biblio.bib", + "cite_by": "apalike", + "current_citInitial": 1, + "eqLabelWithNumbers": true, + "eqNumInitial": 1, + "hotkeys": { + "equation": "Ctrl-E", + "itemize": "Ctrl-I" + }, + "labels_anchors": false, + "latex_user_defs": false, + "report_style_numbering": false, + "user_envs_cfg": false + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} diff --git a/ml4/2_5_1_Exercise.ipynb b/ml4/2_5_1_Exercise.ipynb new file mode 100644 index 0000000..0e4e233 --- /dev/null +++ b/ml4/2_5_1_Exercise.ipynb @@ -0,0 +1,303 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![](images/EscUpmPolit_p.gif \"UPM\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Course Notes for Learning Intelligent Systems" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2018 Carlos A. Iglesias" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## [Introduction to Machine Learning III](4_0_0_Intro_ML_3.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Table of Contents\n", + "\n", + "* [Introduction](#Introduction)\n", + "* [Genetic Algorithms](#Genetic-Algorithms)\n", + "* [Reading Data from a File](#Reading-Data-from-a-File)\n", + "* [Exercises](#Exercises)\n", + "* [Optional exercises](#Optional-exercises)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Introduction\n", + "The purpose of this practice is to understand better how GAs work. \n", + "\n", + "There are many libraries that implement GAs, you can find some of then in the [References](#References) section." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Genetic Algorithms\n", + "In this section we are going to use the library [DEAP](#References) for implementing a genetic algorithms.\n", + "\n", + "We are going to implement the OneMax problem as seen in class.\n", + "\n", + "First, follow the DEAP package instructions and install DEAP.\n", + "\n", + "Then, follow the following notebook [OneMax](https://github.com/DEAP/notebooks/blob/master/OneMax.ipynb) to understand how DEAP works and solves this problem. Observe that it is requested to register types and functions in the DEAP framework. Observe also how you can execute genetic operators such as mutate.\n", + "\n", + "We have included a simple code that solves the OneMax problem in the following cell (taken from [DEAP](http://deap.readthedocs.io/en/master/examples/ga_onemax.html) and added a line to show the best individual in each generation).\n", + "\n", + "Read tutorial from [DEAP](http://deap.readthedocs.io/en/master/examples/ga_onemax.html) to understand the code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "import random\n", + "\n", + "from deap import base\n", + "from deap import creator\n", + "from deap import tools\n", + "\n", + "creator.create(\"FitnessMax\", base.Fitness, weights=(1.0,))\n", + "creator.create(\"Individual\", list, fitness=creator.FitnessMax)\n", + "\n", + "toolbox = base.Toolbox()\n", + "# Attribute generator \n", + "toolbox.register(\"attr_bool\", random.randint, 0, 1)\n", + "# Structure initializers\n", + "toolbox.register(\"individual\", tools.initRepeat, creator.Individual, \n", + " toolbox.attr_bool, 100)\n", + "toolbox.register(\"population\", tools.initRepeat, list, toolbox.individual)\n", + "\n", + "def evalOneMax(individual):\n", + " return sum(individual),\n", + "\n", + "toolbox.register(\"evaluate\", evalOneMax)\n", + "toolbox.register(\"mate\", tools.cxTwoPoint)\n", + "toolbox.register(\"mutate\", tools.mutFlipBit, indpb=0.05)\n", + "toolbox.register(\"select\", tools.selTournament, tournsize=3)\n", + "\n", + "\n", + "def main():\n", + " pop = toolbox.population(n=300)\n", + " CXPB, MUTPB, NGEN = 0.5, 0.2, 40\n", + " \n", + " # Evaluate the entire population\n", + " fitnesses = list(map(toolbox.evaluate, pop))\n", + " for ind, fit in zip(pop, fitnesses):\n", + " ind.fitness.values = fit\n", + " # Extracting all the fitnesses of \n", + " fits = [ind.fitness.values[0] for ind in pop]\n", + " \n", + " # Variable keeping track of the number of generations \n", + " g = 0\n", + " \n", + " # Begin the evolution\n", + " while max(fits) < 100 and g < 1000:\n", + " # A new generation\n", + " g = g + 1\n", + " print(\"-- Generation %i --\" % g)\n", + " # Select the next generation individuals\n", + " offspring = toolbox.select(pop, len(pop))\n", + " # Clone the selected individuals\n", + " offspring = list(map(toolbox.clone, offspring))\n", + " # Apply crossover and mutation on the offspring\n", + " for child1, child2 in zip(offspring[::2], offspring[1::2]):\n", + " if random.random() < CXPB:\n", + " toolbox.mate(child1, child2)\n", + " del child1.fitness.values\n", + " del child2.fitness.values\n", + "\n", + " for mutant in offspring:\n", + " if random.random() < MUTPB:\n", + " toolbox.mutate(mutant)\n", + " del mutant.fitness.values\n", + " # Evaluate the individuals with an invalid fitness\n", + " invalid_ind = [ind for ind in offspring if not ind.fitness.valid]\n", + " fitnesses = map(toolbox.evaluate, invalid_ind)\n", + " for ind, fit in zip(invalid_ind, fitnesses):\n", + " ind.fitness.values = fit\n", + " \n", + " pop[:] = offspring\n", + " \n", + " # Gather all the fitnesses in one list and print the stats\n", + " fits = [ind.fitness.values[0] for ind in pop]\n", + " \n", + " length = len(pop)\n", + " mean = sum(fits) / length\n", + " sum2 = sum(x*x for x in fits)\n", + " std = abs(sum2 / length - mean**2)**0.5\n", + " \n", + " print(\" Min %s\" % min(fits))\n", + " print(\" Max %s\" % max(fits))\n", + " print(\" Avg %s\" % mean)\n", + " print(\" Std %s\" % std)\n", + " best_ind = tools.selBest(pop, 1)[0]\n", + " print(\"Best individual so far is %s, %s\" % (best_ind, best_ind.fitness.values))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run the genetic algorithm and interpret the results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "main()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercises" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Comparing\n", + "Your task is modify the previous code to canonical GA configuration from Holland (look at the lesson's slides). In addition you should consult the [DEAP API](http://deap.readthedocs.io/en/master/api/tools.html#operators).\n", + "\n", + "Submit your notebook and include a the modified code, and a comparison of the effects of these changes. \n", + "\n", + "Discuss your findings." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Optimizing ML hyperparameters\n", + "\n", + "One of the applications of Genetic Algorithms is the optimization of ML hyperparameters. Previously we have used GridSearch from Scikit. Using (sklearn-deap)[#References], optimize the Titatic hyperparameters using both GridSearch and Genetic Algorithms. \n", + "\n", + "The same exercise (using the digits dataset) can be found in this [notebook](https://github.com/rsteca/sklearn-deap/blob/master/test.ipynb).\n", + "\n", + "Submit a notebook where you include well-crafted conclusions about the exercises, discussing the pros and cons of using genetic algorithms for this purpose.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Optional exercises\n", + "\n", + "Here there is a proposed optional exercise." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Optimizing a ML pipeline with a genetic algorithm\n", + "\n", + "The library [TPOT](#References) optimizes ML pipelines and comes with a lot of (examples)[https://epistasislab.github.io/tpot/examples/] and even notebooks, for example for the [iris dataset](https://github.com/EpistasisLab/tpot/blob/master/tutorials/IRIS.ipynb).\n", + "\n", + "Your task is to apply TPOT to the intermediate challenge and write a short essay explaining:\n", + "* what TPOT does (with your own words).\n", + "* how you have experimented with TPOT (what you have tried and how long. Take into account that it should be run from hours to days to get good results. Read the documentation, it is not that long!).\n", + "* the results. If TPOT is rather clever or your group got better results." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## References\n", + "* [deap](https://github.com/deap/deap)\n", + "* [sklearn-deap](https://github.com/rsteca/sklearn-deap)\n", + "* [tpot](http://epistasislab.github.io/tpot/)\n", + "* [gplearn](http://gplearn.readthedocs.io/en/latest/index.html)\n", + "* [scikit-allel](https://scikit-allel.readthedocs.io/en/latest/)\n", + "* [scklearn-genetic](https://github.com/manuel-calzolari/sklearn-genetic)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Licence" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n", + "\n", + "© 2018 Carlos A. Iglesias, Universidad Politécnica de Madrid." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.5" + }, + "latex_envs": { + "LaTeX_envs_menu_present": true, + "autocomplete": true, + "bibliofile": "biblio.bib", + "cite_by": "apalike", + "current_citInitial": 1, + "eqLabelWithNumbers": true, + "eqNumInitial": 1, + "hotkeys": { + "equation": "Ctrl-E", + "itemize": "Ctrl-I" + }, + "labels_anchors": false, + "latex_user_defs": false, + "report_style_numbering": false, + "user_envs_cfg": false + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} diff --git a/ml4/images/EscUpmPolit_p.gif b/ml4/images/EscUpmPolit_p.gif new file mode 100644 index 0000000000000000000000000000000000000000..a821282d01d9828217973d170df5311b92f3fe6b GIT binary patch literal 3171 zcmV-p44m^vNk%w1VOju50QUd@0000!VRu48K2A+XSyoD3T1{3}P*q-9M_OS-Uv5lX zVnJnlOK5pfXmVL^a$RzFWnoTdWK&>cT54ocYiL?=YF2V?TYGd{VQ6A+Y+!P3V{ma{ zcyeNYduDroZFO~XdV6|BZ-+#2i%oQiLw1uze4I>tm{N9%Tz!s2f1+G~k41o^Mun+K zgr`V}uStrrNRGHllD$=ptzVF=VS|@uiJfnamvD`hbC8y0jHGUwvv884aG9}tpR{_U zy-b|UU4p<{pTWNcsoYPj;!v#OQLyJxvg%=m!exrY zYmLTapTTaB$8M6ybd?mzbHEn3kHG znuMLJlAWWPpP`MczLlh{o1mhbq^hK%o2sRhx~zk|uZX>{ioUXpqobptrl_N;t*WP> zsHmr^tE;4~u&lJXvazkQva+_fxUsptySu!Bp3a1z&xE1SiKNkvsMCe8$ceGXlB?C9 zz0Q@e*O#%_n6THKwAr4z+@Q7Eqq*FwzTUCBz`4J|sKL&oz|f}1)2YSLt-#;9!Nsn@ z;IqWyx5(qV%H+Vaj=;2!!L^aXx0J)Ul*G7}#k!csyqm|ooXEYM%Dz@p5;qs+pk z&BUh8#i-E6tI^1<(#o*Z%d*tWvenJC*Uz}u(74&ryTZf6&CSck&Ct)#(A(6&+||L} z*Tmk~#njZ))z{hA*x1dV;c)7|dZ;PBhz@!;9V;M&RI+{)tJ z%;etAgwR<^W*9C z>FxIH@%ZrV}00000A^8LW00930EC2ui09pV^000R80RIUbNU)&6g9sBUT*$DY!-o(fN}Ncs zqQ#3CGiuy;5!)|ZAaeoJ#<8TuT&iI4(iScpIdZ^u&0@97q)mq2TEWuxYa23T?o`FP z_v)OlciLvbB6qVX!C0`a_3Nii*gaO<(5P`2Yu~GGxUeaERmlm*TxjFmQX zz=T=##cEbEck}Muv*+%bEm*sRd7HJZpFCB`He1a0#~dth_kQ(Sr7PalsrZghoyRK{ zvRhN{A@k!+H$^_e+}(?n?3b!@%MlU(OZSCMzh}mdyKD8!m$z5K!tt>#5f(OWtiqYQ zH5A`Be8gO!!5$Y_Fvu7bJcG|X{oq53FtF713oobtOWu zC+u%g#-#Z0op2qK_x z0}eN!FhU3_^k4@w%ALcCXx#|^GvA~IX(LNE~}#IH*z@xl*3#PI8{N(ka7 zP3OF`Po1<9lnbA%Oxx?RO_V@_2qe6~!m&yi;wPW5*@8^81NothJNx9c>2bdP%7P0B z6u7{!$cpQxE6Ma@jxV{OE07x3bc2pCelVz9yhOmTzy(}vfk6pE0PBgL_WWU$GTE>q z@IQR)@(M2q^Kl> zqde{$(~6>@{2D|dZ@jSxBZ?GqN92<$2b(a-bdx!{e0=1~U?wY9$*(*V!UhO{1pkI2 zNGQFCpZur-PB+NN`Y|8mgoDg43H!Nw6Ot$bLK}rBk_aG&5TSG_e#XNJFjHm|%g0fg zBMUiqS^ujKg>27A@P*_NIT7S5=gnRG@C7b@@iAT;0~@yhhc0{}zb0rQ5rjy@{R)u? zIxInMpYYSPz`+k}sG=Jmo6Rkh5e{(l!B3%Uf)R9Jf*CMi1UWcC4es`XVJIhg*@*`? za3-)mcmo^RKn5%SEO(w~qyq~|h+Eu>K!axxLt)*}MK~CUjaPh3Ah!^PJo=%GU---! z%NRyEo{@+;1OgCqNP-OL5Qsp?K@iV~BdhlDDq&<}7p)-46~zH3TFhb|uh>R8;fC+aECjjr;Aze!X2z&kSy+zjBAecAL5{n zFm%z6Sp8xjirGakZlMcd%n26#2*)nuF+D&0G$F6hg*k-rlUfv281KN8esa-|KiFd* zv#4TOu9^^p{o)?*@Iyc1(+`#Sq88NLg)-ie3qE9ntqk$SADa4(U4SDSRmui9a-j`m zz~T?*0<1&2aSK+wVsFA=%uHw^*@}?E8`{_gILy)6(wde&>#znjror0Qx^^1Uhz2xL zLEGB@+V-}$g+?@X%Uj+`!?o0~hH$O1j^dhR3P4D%a+k~8<~sMeKyU(dpZf$SSl0

2@6sF7G_9~YO)_8<_=j&c4qyi!9rLTUMON~@a zSGxOk?|l!F-{dwT6{p|?6_g7FPDn!fvZ0NTV7+NHP#op${2Hw-Y&b0wwyPj}h+~A+83(Ci?sYcBG;c^@ueU z{*374DjEn>XmoNR-CZcOLmJKP#Fjao+%&J58c_K2bFHC{cKCO>={-oSGtm!eSR)^g z2ySzO+v_N|Be{%r^fdZGk9JccA7^GRuBku`deHgPp{52T=ImTZ^n((qphtm$011Q` zS-C-0$21}lYHN(!8s+|Vu+?piNLc(4`e3-c>%m=X(4!H~X7j$U;9P0YgAzgi=EQOx z0qf=Rxxl8#^>V3Dk5fdP9q1PJquZQZV@D&|90vjvtkI5jfI7)EM@4gkt(uORH&H67q&4f z;(m%)XBxmz0KI-oAK>OYU%5|E{`03#efLKHAdu*;cA;?H?sJ~`)ptHX_@isx?0T0W z>`4DQ)G=JuCi(rb?d@)H3u4yWpY@#3g#P!>fBti#6ZUrsw$)pZmSn?KWv7*Z3b=p_ J7#INo06U6|WDWoT literal 0 HcmV?d00001