1
0
mirror of https://github.com/gsi-upm/sitc synced 2025-01-09 20:41:27 +00:00

Compare commits

..

No commits in common. "a7c6be5b96293a7fb74e8e232c9927948a126a36" and "ffefd8c2e3a4c340f0853de9691c81a9bf335c8b" have entirely different histories.

2 changed files with 17 additions and 17 deletions

View File

@ -340,7 +340,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"We are going to tune the algorithm, and calculate which is the best value for the k hyperparameter." "We are going to tune the algorithm, and calculate which is the best value for the k parameter."
] ]
}, },
{ {

View File

@ -39,7 +39,7 @@
"* [Train classifier](#Train-classifier)\n", "* [Train classifier](#Train-classifier)\n",
"* [More about Pipelines](#More-about-Pipelines)\n", "* [More about Pipelines](#More-about-Pipelines)\n",
"* [Tuning the algorithm](#Tuning-the-algorithm)\n", "* [Tuning the algorithm](#Tuning-the-algorithm)\n",
"\t* [Grid Search for Hyperparameter optimization](#Grid-Search-for-Hyperparameter-optimization)\n", "\t* [Grid Search for Parameter optimization](#Grid-Search-for-Parameter-optimization)\n",
"* [Evaluating the algorithm](#Evaluating-the-algorithm)\n", "* [Evaluating the algorithm](#Evaluating-the-algorithm)\n",
"\t* [K-Fold validation](#K-Fold-validation)\n", "\t* [K-Fold validation](#K-Fold-validation)\n",
"* [References](#References)\n" "* [References](#References)\n"
@ -56,9 +56,9 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"In the previous [notebook](2_5_2_Decision_Tree_Model.ipynb), we got an accuracy of 9.47. Could we get a better accuracy if we tune the hyperparameters of the estimator?\n", "In the previous [notebook](2_5_2_Decision_Tree_Model.ipynb), we got an accuracy of 9.47. Could we get a better accuracy if we tune the parameters of the estimator?\n",
"\n", "\n",
"The goal of this notebook is to learn how to tune an algorithm by opimizing its hyperparameters using grid search." "The goal of this notebook is to learn how to tune an algorithm by opimizing its parameters using grid search."
] ]
}, },
{ {
@ -300,21 +300,21 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"You can try different values for these hyperparameters and observe the results." "You can try different values for these parameters and observe the results."
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Grid Search for Hyperparameter optimization" "### Grid Search for Parameter optimization"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Changing manually the hyperparameters to find their optimal values is not practical. Instead, we can consider to find the optimal value of the hyperparameters as an *optimization problem*. \n", "Changing manually the parameters to find their optimal values is not practical. Instead, we can consider to find the optimal value of the parameters as an *optimization problem*. \n",
"\n", "\n",
"The sklearn comes with several optimization techniques for this purpose, such as **grid search** and **randomized search**. In this notebook we are going to introduce the former one." "The sklearn comes with several optimization techniques for this purpose, such as **grid search** and **randomized search**. In this notebook we are going to introduce the former one."
] ]
@ -323,7 +323,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The sklearn provides an object that, given data, computes the score during the fit of an estimator on a hyperparameter grid and chooses the hyperparameters to maximize the cross-validation score. " "The sklearn provides an object that, given data, computes the score during the fit of an estimator on a parameter grid and chooses the parameters to maximize the cross-validation score. "
] ]
}, },
{ {
@ -371,7 +371,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"We can now evaluate the KFold with this optimized hyperparameter as follows." "We can now evaluate the KFold with this optimized parameter as follows."
] ]
}, },
{ {
@ -405,7 +405,7 @@
"source": [ "source": [
"We have got an *improvement* from 0.947 to 0.953 with k-fold.\n", "We have got an *improvement* from 0.947 to 0.953 with k-fold.\n",
"\n", "\n",
"We are now to try to fit the best combination of the hyperparameters of the algorithm. It can take some time to compute it." "We are now to try to fit the best combination of the parameters of the algorithm. It can take some time to compute it."
] ]
}, },
{ {
@ -414,12 +414,12 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Set the hyperparameters by cross-validation\n", "# Set the parameters by cross-validation\n",
"\n", "\n",
"from sklearn.metrics import classification_report, recall_score, precision_score, make_scorer\n", "from sklearn.metrics import classification_report, recall_score, precision_score, make_scorer\n",
"\n", "\n",
"# set of hyperparameters to test\n", "# set of parameters to test\n",
"tuned_hyperparameters = [{'max_depth': np.arange(3, 10),\n", "tuned_parameters = [{'max_depth': np.arange(3, 10),\n",
"# 'max_weights': [1, 10, 100, 1000]},\n", "# 'max_weights': [1, 10, 100, 1000]},\n",
" 'criterion': ['gini', 'entropy'], \n", " 'criterion': ['gini', 'entropy'], \n",
" 'splitter': ['best', 'random'],\n", " 'splitter': ['best', 'random'],\n",
@ -431,7 +431,7 @@
"scores = ['precision', 'recall']\n", "scores = ['precision', 'recall']\n",
"\n", "\n",
"for score in scores:\n", "for score in scores:\n",
" print(\"# Tuning hyperparameters for %s\" % score)\n", " print(\"# Tuning hyper-parameters for %s\" % score)\n",
" print()\n", " print()\n",
"\n", "\n",
" if score == 'precision':\n", " if score == 'precision':\n",
@ -440,10 +440,10 @@
" scorer = make_scorer(recall_score, average='weighted', zero_division=0)\n", " scorer = make_scorer(recall_score, average='weighted', zero_division=0)\n",
" \n", " \n",
" # cv = the fold of the cross-validation cv, defaulted to 5\n", " # cv = the fold of the cross-validation cv, defaulted to 5\n",
" gs = GridSearchCV(DecisionTreeClassifier(), tuned_hyperparameters, cv=10, scoring=scorer)\n", " gs = GridSearchCV(DecisionTreeClassifier(), tuned_parameters, cv=10, scoring=scorer)\n",
" gs.fit(x_train, y_train)\n", " gs.fit(x_train, y_train)\n",
"\n", "\n",
" print(\"Best hyperparameters set found on development set:\")\n", " print(\"Best parameters set found on development set:\")\n",
" print()\n", " print()\n",
" print(gs.best_params_)\n", " print(gs.best_params_)\n",
" print()\n", " print()\n",
@ -520,7 +520,7 @@
"* [Plot the decision surface of a decision tree on the iris dataset](https://scikit-learn.org/stable/auto_examples/tree/plot_iris_dtc.html)\n", "* [Plot the decision surface of a decision tree on the iris dataset](https://scikit-learn.org/stable/auto_examples/tree/plot_iris_dtc.html)\n",
"* [scikit-learn : Machine Learning Simplified](https://learning.oreilly.com/library/view/scikit-learn-machine/9781788833479/), Raúl Garreta; Guillermo Moncecchi, Packt Publishing, 2017.\n", "* [scikit-learn : Machine Learning Simplified](https://learning.oreilly.com/library/view/scikit-learn-machine/9781788833479/), Raúl Garreta; Guillermo Moncecchi, Packt Publishing, 2017.\n",
"* [Python Machine Learning](https://learning.oreilly.com/library/view/python-machine-learning/9781789955750/), Sebastian Raschka, Packt Publishing, 2019.\n", "* [Python Machine Learning](https://learning.oreilly.com/library/view/python-machine-learning/9781789955750/), Sebastian Raschka, Packt Publishing, 2019.\n",
"* [Hyperparameter estimation using grid search with cross-validation](http://scikit-learn.org/stable/auto_examples/model_selection/grid_search_digits.html)\n", "* [Parameter estimation using grid search with cross-validation](http://scikit-learn.org/stable/auto_examples/model_selection/grid_search_digits.html)\n",
"* [Decision trees in python with scikit-learn and pandas](http://chrisstrelioff.ws/sandbox/2015/06/08/decision_trees_in_python_with_scikit_learn_and_pandas.html)" "* [Decision trees in python with scikit-learn and pandas](http://chrisstrelioff.ws/sandbox/2015/06/08/decision_trees_in_python_with_scikit_learn_and_pandas.html)"
] ]
}, },