mirror of
https://github.com/gsi-upm/sitc
synced 2025-01-09 20:41:27 +00:00
Compare commits
No commits in common. "a7c6be5b96293a7fb74e8e232c9927948a126a36" and "ffefd8c2e3a4c340f0853de9691c81a9bf335c8b" have entirely different histories.
a7c6be5b96
...
ffefd8c2e3
@ -340,7 +340,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We are going to tune the algorithm, and calculate which is the best value for the k hyperparameter."
|
"We are going to tune the algorithm, and calculate which is the best value for the k parameter."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -39,7 +39,7 @@
|
|||||||
"* [Train classifier](#Train-classifier)\n",
|
"* [Train classifier](#Train-classifier)\n",
|
||||||
"* [More about Pipelines](#More-about-Pipelines)\n",
|
"* [More about Pipelines](#More-about-Pipelines)\n",
|
||||||
"* [Tuning the algorithm](#Tuning-the-algorithm)\n",
|
"* [Tuning the algorithm](#Tuning-the-algorithm)\n",
|
||||||
"\t* [Grid Search for Hyperparameter optimization](#Grid-Search-for-Hyperparameter-optimization)\n",
|
"\t* [Grid Search for Parameter optimization](#Grid-Search-for-Parameter-optimization)\n",
|
||||||
"* [Evaluating the algorithm](#Evaluating-the-algorithm)\n",
|
"* [Evaluating the algorithm](#Evaluating-the-algorithm)\n",
|
||||||
"\t* [K-Fold validation](#K-Fold-validation)\n",
|
"\t* [K-Fold validation](#K-Fold-validation)\n",
|
||||||
"* [References](#References)\n"
|
"* [References](#References)\n"
|
||||||
@ -56,9 +56,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"In the previous [notebook](2_5_2_Decision_Tree_Model.ipynb), we got an accuracy of 9.47. Could we get a better accuracy if we tune the hyperparameters of the estimator?\n",
|
"In the previous [notebook](2_5_2_Decision_Tree_Model.ipynb), we got an accuracy of 9.47. Could we get a better accuracy if we tune the parameters of the estimator?\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The goal of this notebook is to learn how to tune an algorithm by opimizing its hyperparameters using grid search."
|
"The goal of this notebook is to learn how to tune an algorithm by opimizing its parameters using grid search."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -300,21 +300,21 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can try different values for these hyperparameters and observe the results."
|
"You can try different values for these parameters and observe the results."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Grid Search for Hyperparameter optimization"
|
"### Grid Search for Parameter optimization"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Changing manually the hyperparameters to find their optimal values is not practical. Instead, we can consider to find the optimal value of the hyperparameters as an *optimization problem*. \n",
|
"Changing manually the parameters to find their optimal values is not practical. Instead, we can consider to find the optimal value of the parameters as an *optimization problem*. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"The sklearn comes with several optimization techniques for this purpose, such as **grid search** and **randomized search**. In this notebook we are going to introduce the former one."
|
"The sklearn comes with several optimization techniques for this purpose, such as **grid search** and **randomized search**. In this notebook we are going to introduce the former one."
|
||||||
]
|
]
|
||||||
@ -323,7 +323,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The sklearn provides an object that, given data, computes the score during the fit of an estimator on a hyperparameter grid and chooses the hyperparameters to maximize the cross-validation score. "
|
"The sklearn provides an object that, given data, computes the score during the fit of an estimator on a parameter grid and chooses the parameters to maximize the cross-validation score. "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -371,7 +371,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We can now evaluate the KFold with this optimized hyperparameter as follows."
|
"We can now evaluate the KFold with this optimized parameter as follows."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -405,7 +405,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"We have got an *improvement* from 0.947 to 0.953 with k-fold.\n",
|
"We have got an *improvement* from 0.947 to 0.953 with k-fold.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"We are now to try to fit the best combination of the hyperparameters of the algorithm. It can take some time to compute it."
|
"We are now to try to fit the best combination of the parameters of the algorithm. It can take some time to compute it."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -414,12 +414,12 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Set the hyperparameters by cross-validation\n",
|
"# Set the parameters by cross-validation\n",
|
||||||
"\n",
|
"\n",
|
||||||
"from sklearn.metrics import classification_report, recall_score, precision_score, make_scorer\n",
|
"from sklearn.metrics import classification_report, recall_score, precision_score, make_scorer\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# set of hyperparameters to test\n",
|
"# set of parameters to test\n",
|
||||||
"tuned_hyperparameters = [{'max_depth': np.arange(3, 10),\n",
|
"tuned_parameters = [{'max_depth': np.arange(3, 10),\n",
|
||||||
"# 'max_weights': [1, 10, 100, 1000]},\n",
|
"# 'max_weights': [1, 10, 100, 1000]},\n",
|
||||||
" 'criterion': ['gini', 'entropy'], \n",
|
" 'criterion': ['gini', 'entropy'], \n",
|
||||||
" 'splitter': ['best', 'random'],\n",
|
" 'splitter': ['best', 'random'],\n",
|
||||||
@ -431,7 +431,7 @@
|
|||||||
"scores = ['precision', 'recall']\n",
|
"scores = ['precision', 'recall']\n",
|
||||||
"\n",
|
"\n",
|
||||||
"for score in scores:\n",
|
"for score in scores:\n",
|
||||||
" print(\"# Tuning hyperparameters for %s\" % score)\n",
|
" print(\"# Tuning hyper-parameters for %s\" % score)\n",
|
||||||
" print()\n",
|
" print()\n",
|
||||||
"\n",
|
"\n",
|
||||||
" if score == 'precision':\n",
|
" if score == 'precision':\n",
|
||||||
@ -440,10 +440,10 @@
|
|||||||
" scorer = make_scorer(recall_score, average='weighted', zero_division=0)\n",
|
" scorer = make_scorer(recall_score, average='weighted', zero_division=0)\n",
|
||||||
" \n",
|
" \n",
|
||||||
" # cv = the fold of the cross-validation cv, defaulted to 5\n",
|
" # cv = the fold of the cross-validation cv, defaulted to 5\n",
|
||||||
" gs = GridSearchCV(DecisionTreeClassifier(), tuned_hyperparameters, cv=10, scoring=scorer)\n",
|
" gs = GridSearchCV(DecisionTreeClassifier(), tuned_parameters, cv=10, scoring=scorer)\n",
|
||||||
" gs.fit(x_train, y_train)\n",
|
" gs.fit(x_train, y_train)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" print(\"Best hyperparameters set found on development set:\")\n",
|
" print(\"Best parameters set found on development set:\")\n",
|
||||||
" print()\n",
|
" print()\n",
|
||||||
" print(gs.best_params_)\n",
|
" print(gs.best_params_)\n",
|
||||||
" print()\n",
|
" print()\n",
|
||||||
@ -520,7 +520,7 @@
|
|||||||
"* [Plot the decision surface of a decision tree on the iris dataset](https://scikit-learn.org/stable/auto_examples/tree/plot_iris_dtc.html)\n",
|
"* [Plot the decision surface of a decision tree on the iris dataset](https://scikit-learn.org/stable/auto_examples/tree/plot_iris_dtc.html)\n",
|
||||||
"* [scikit-learn : Machine Learning Simplified](https://learning.oreilly.com/library/view/scikit-learn-machine/9781788833479/), Raúl Garreta; Guillermo Moncecchi, Packt Publishing, 2017.\n",
|
"* [scikit-learn : Machine Learning Simplified](https://learning.oreilly.com/library/view/scikit-learn-machine/9781788833479/), Raúl Garreta; Guillermo Moncecchi, Packt Publishing, 2017.\n",
|
||||||
"* [Python Machine Learning](https://learning.oreilly.com/library/view/python-machine-learning/9781789955750/), Sebastian Raschka, Packt Publishing, 2019.\n",
|
"* [Python Machine Learning](https://learning.oreilly.com/library/view/python-machine-learning/9781789955750/), Sebastian Raschka, Packt Publishing, 2019.\n",
|
||||||
"* [Hyperparameter estimation using grid search with cross-validation](http://scikit-learn.org/stable/auto_examples/model_selection/grid_search_digits.html)\n",
|
"* [Parameter estimation using grid search with cross-validation](http://scikit-learn.org/stable/auto_examples/model_selection/grid_search_digits.html)\n",
|
||||||
"* [Decision trees in python with scikit-learn and pandas](http://chrisstrelioff.ws/sandbox/2015/06/08/decision_trees_in_python_with_scikit_learn_and_pandas.html)"
|
"* [Decision trees in python with scikit-learn and pandas](http://chrisstrelioff.ws/sandbox/2015/06/08/decision_trees_in_python_with_scikit_learn_and_pandas.html)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
Loading…
Reference in New Issue
Block a user