mirror of
https://github.com/gsi-upm/sitc
synced 2025-06-14 04:02:20 +00:00
Update 2_6_Model_Tuning.ipynb
This commit is contained in:
parent
21e7ae2f57
commit
2e4ec3cfdc
@ -4,7 +4,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -58,7 +58,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"In the previous [notebook](2_5_2_Decision_Tree_Model.ipynb), we got an accuracy of 9.47. Could we get a better accuracy if we tune the hyperparameters of the estimator?\n",
|
"In the previous [notebook](2_5_2_Decision_Tree_Model.ipynb), we got an accuracy of 9.47. Could we get a better accuracy if we tune the hyperparameters of the estimator?\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The goal of this notebook is to learn how to tune an algorithm by opimizing its hyperparameters using grid search."
|
"This notebook aims to learn how to tune an algorithm by optimizing its hyperparameters using grid search."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -137,7 +137,7 @@
|
|||||||
"# create a k-fold cross validation iterator of k=10 folds\n",
|
"# create a k-fold cross validation iterator of k=10 folds\n",
|
||||||
"cv = KFold(10, shuffle=True, random_state=33)\n",
|
"cv = KFold(10, shuffle=True, random_state=33)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
|
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
|
||||||
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
|
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from scipy.stats import sem\n",
|
"from scipy.stats import sem\n",
|
||||||
@ -189,7 +189,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We can get the list of parameters of the model. As you will observe, the parameters of the estimators in the pipeline can be accessed using the <estimator>__<parameter> syntax. We will use this for tuning the parameters."
|
"We can get the list of model parameters. As you will observe, the parameters of the estimators in the pipeline can be accessed using the <estimator>__<parameter> syntax. We will use this for tuning the parameters."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -205,7 +205,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Let's see what happens if we change a parameter"
|
"Let's see what happens if we change a parameter."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -284,7 +284,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"Look at the [API](http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) of *scikit-learn* to understand better the algorithm, as well as which parameters can be tuned. As you see, we can change several ones, such as *criterion*, *splitter*, *max_features*, *max_depth*, *min_samples_split*, *class_weight*, etc.\n",
|
"Look at the [API](http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) of *scikit-learn* to understand better the algorithm, as well as which parameters can be tuned. As you see, we can change several ones, such as *criterion*, *splitter*, *max_features*, *max_depth*, *min_samples_split*, *class_weight*, etc.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"We can get the full list parameters of an estimator with the method *get_params()*. "
|
"We can get an estimator's full list of parameters with the method *get_params()*. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -314,16 +314,16 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Changing manually the hyperparameters to find their optimal values is not practical. Instead, we can consider to find the optimal value of the hyperparameters as an *optimization problem*. \n",
|
"Changing manually the hyperparameters to find their optimal values is not practical. Instead, we can consider finding the optimal value of the hyperparameters as an *optimization problem*. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"The sklearn comes with several optimization techniques for this purpose, such as **grid search** and **randomized search**. In this notebook we are going to introduce the former one."
|
"Sklearn has several optimization techniques, such as **grid search** and **randomized search**. In this notebook, we are going to introduce the former one."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The sklearn provides an object that, given data, computes the score during the fit of an estimator on a hyperparameter grid and chooses the hyperparameters to maximize the cross-validation score. "
|
"Sklearn provides an object that, given data, computes the score during the fit of an estimator on a hyperparameter grid and chooses the hyperparameters to maximize the cross-validation score. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -351,7 +351,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Now we are going to show the results of grid search"
|
"Now we are going to show the results of the grid search"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -392,7 +392,7 @@
|
|||||||
"# create a k-fold cross validation iterator of k=10 folds\n",
|
"# create a k-fold cross validation iterator of k=10 folds\n",
|
||||||
"cv = KFold(10, shuffle=True, random_state=33)\n",
|
"cv = KFold(10, shuffle=True, random_state=33)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
|
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
|
||||||
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
|
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
|
||||||
"def mean_score(scores):\n",
|
"def mean_score(scores):\n",
|
||||||
" return (\"Mean score: {0:.3f} (+/- {1:.3f})\").format(np.mean(scores), sem(scores))\n",
|
" return (\"Mean score: {0:.3f} (+/- {1:.3f})\").format(np.mean(scores), sem(scores))\n",
|
||||||
@ -405,7 +405,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"We have got an *improvement* from 0.947 to 0.953 with k-fold.\n",
|
"We have got an *improvement* from 0.947 to 0.953 with k-fold.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"We are now to try to fit the best combination of the hyperparameters of the algorithm. It can take some time to compute it."
|
"We are now trying to fit the best combination of the hyperparameters of the algorithm. It can take some time to compute it."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -492,7 +492,7 @@
|
|||||||
"# create a k-fold cross validation iterator of k=10 folds\n",
|
"# create a k-fold cross validation iterator of k=10 folds\n",
|
||||||
"cv = KFold(10, shuffle=True, random_state=33)\n",
|
"cv = KFold(10, shuffle=True, random_state=33)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
|
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
|
||||||
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
|
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
|
||||||
"def mean_score(scores):\n",
|
"def mean_score(scores):\n",
|
||||||
" return (\"Mean score: {0:.3f} (+/- {1:.3f})\").format(np.mean(scores), sem(scores))\n",
|
" return (\"Mean score: {0:.3f} (+/- {1:.3f})\").format(np.mean(scores), sem(scores))\n",
|
||||||
@ -533,7 +533,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||||
"\n",
|
"\n",
|
||||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||||
]
|
]
|
||||||
|
Loading…
x
Reference in New Issue
Block a user