1
0
mirror of https://github.com/gsi-upm/sitc synced 2025-06-13 11:42:21 +00:00

Update 2_6_Model_Tuning.ipynb

This commit is contained in:
Carlos A. Iglesias 2025-06-02 17:16:53 +03:00 committed by GitHub
parent 21e7ae2f57
commit 2e4ec3cfdc
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@ -58,7 +58,7 @@
"source": [
"In the previous [notebook](2_5_2_Decision_Tree_Model.ipynb), we got an accuracy of 9.47. Could we get a better accuracy if we tune the hyperparameters of the estimator?\n",
"\n",
"The goal of this notebook is to learn how to tune an algorithm by opimizing its hyperparameters using grid search."
"This notebook aims to learn how to tune an algorithm by optimizing its hyperparameters using grid search."
]
},
{
@ -137,7 +137,7 @@
"# create a k-fold cross validation iterator of k=10 folds\n",
"cv = KFold(10, shuffle=True, random_state=33)\n",
"\n",
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
"\n",
"from scipy.stats import sem\n",
@ -189,7 +189,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can get the list of parameters of the model. As you will observe, the parameters of the estimators in the pipeline can be accessed using the <estimator>__<parameter> syntax. We will use this for tuning the parameters."
"We can get the list of model parameters. As you will observe, the parameters of the estimators in the pipeline can be accessed using the <estimator>__<parameter> syntax. We will use this for tuning the parameters."
]
},
{
@ -205,7 +205,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see what happens if we change a parameter"
"Let's see what happens if we change a parameter."
]
},
{
@ -284,7 +284,7 @@
"\n",
"Look at the [API](http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) of *scikit-learn* to understand better the algorithm, as well as which parameters can be tuned. As you see, we can change several ones, such as *criterion*, *splitter*, *max_features*, *max_depth*, *min_samples_split*, *class_weight*, etc.\n",
"\n",
"We can get the full list parameters of an estimator with the method *get_params()*. "
"We can get an estimator's full list of parameters with the method *get_params()*. "
]
},
{
@ -314,16 +314,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Changing manually the hyperparameters to find their optimal values is not practical. Instead, we can consider to find the optimal value of the hyperparameters as an *optimization problem*. \n",
"Changing manually the hyperparameters to find their optimal values is not practical. Instead, we can consider finding the optimal value of the hyperparameters as an *optimization problem*. \n",
"\n",
"The sklearn comes with several optimization techniques for this purpose, such as **grid search** and **randomized search**. In this notebook we are going to introduce the former one."
"Sklearn has several optimization techniques, such as **grid search** and **randomized search**. In this notebook, we are going to introduce the former one."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The sklearn provides an object that, given data, computes the score during the fit of an estimator on a hyperparameter grid and chooses the hyperparameters to maximize the cross-validation score. "
"Sklearn provides an object that, given data, computes the score during the fit of an estimator on a hyperparameter grid and chooses the hyperparameters to maximize the cross-validation score. "
]
},
{
@ -351,7 +351,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we are going to show the results of grid search"
"Now we are going to show the results of the grid search"
]
},
{
@ -392,7 +392,7 @@
"# create a k-fold cross validation iterator of k=10 folds\n",
"cv = KFold(10, shuffle=True, random_state=33)\n",
"\n",
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
"def mean_score(scores):\n",
" return (\"Mean score: {0:.3f} (+/- {1:.3f})\").format(np.mean(scores), sem(scores))\n",
@ -405,7 +405,7 @@
"source": [
"We have got an *improvement* from 0.947 to 0.953 with k-fold.\n",
"\n",
"We are now to try to fit the best combination of the hyperparameters of the algorithm. It can take some time to compute it."
"We are now trying to fit the best combination of the hyperparameters of the algorithm. It can take some time to compute it."
]
},
{
@ -492,7 +492,7 @@
"# create a k-fold cross validation iterator of k=10 folds\n",
"cv = KFold(10, shuffle=True, random_state=33)\n",
"\n",
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
"def mean_score(scores):\n",
" return (\"Mean score: {0:.3f} (+/- {1:.3f})\").format(np.mean(scores), sem(scores))\n",
@ -533,7 +533,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]