diff --git a/ml1/2_6_Model_Tuning.ipynb b/ml1/2_6_Model_Tuning.ipynb index df94314..c9c9fd7 100644 --- a/ml1/2_6_Model_Tuning.ipynb +++ b/ml1/2_6_Model_Tuning.ipynb @@ -39,7 +39,7 @@ "* [Train classifier](#Train-classifier)\n", "* [More about Pipelines](#More-about-Pipelines)\n", "* [Tuning the algorithm](#Tuning-the-algorithm)\n", - "\t* [Grid Search for Parameter optimization](#Grid-Search-for-Parameter-optimization)\n", + "\t* [Grid Search for Hyperparameter optimization](#Grid-Search-for-Hyperparameter-optimization)\n", "* [Evaluating the algorithm](#Evaluating-the-algorithm)\n", "\t* [K-Fold validation](#K-Fold-validation)\n", "* [References](#References)\n" @@ -56,9 +56,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In the previous [notebook](2_5_2_Decision_Tree_Model.ipynb), we got an accuracy of 9.47. Could we get a better accuracy if we tune the parameters of the estimator?\n", + "In the previous [notebook](2_5_2_Decision_Tree_Model.ipynb), we got an accuracy of 9.47. Could we get a better accuracy if we tune the hyperparameters of the estimator?\n", "\n", - "The goal of this notebook is to learn how to tune an algorithm by opimizing its parameters using grid search." + "The goal of this notebook is to learn how to tune an algorithm by opimizing its hyperparameters using grid search." ] }, { @@ -300,21 +300,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You can try different values for these parameters and observe the results." + "You can try different values for these hyperparameters and observe the results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Grid Search for Parameter optimization" + "### Grid Search for Hyperparameter optimization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Changing manually the parameters to find their optimal values is not practical. Instead, we can consider to find the optimal value of the parameters as an *optimization problem*. \n", + "Changing manually the hyperparameters to find their optimal values is not practical. Instead, we can consider to find the optimal value of the parameters as an *optimization problem*. \n", "\n", "The sklearn comes with several optimization techniques for this purpose, such as **grid search** and **randomized search**. In this notebook we are going to introduce the former one." ] @@ -405,7 +405,7 @@ "source": [ "We have got an *improvement* from 0.947 to 0.953 with k-fold.\n", "\n", - "We are now to try to fit the best combination of the parameters of the algorithm. It can take some time to compute it." + "We are now to try to fit the best combination of the hyperparameters of the algorithm. It can take some time to compute it." ] }, { @@ -414,12 +414,12 @@ "metadata": {}, "outputs": [], "source": [ - "# Set the parameters by cross-validation\n", + "# Set the hyperparameters by cross-validation\n", "\n", "from sklearn.metrics import classification_report, recall_score, precision_score, make_scorer\n", "\n", - "# set of parameters to test\n", - "tuned_parameters = [{'max_depth': np.arange(3, 10),\n", + "# set of hyperparameters to test\n", + "tuned_hyperparameters = [{'max_depth': np.arange(3, 10),\n", "# 'max_weights': [1, 10, 100, 1000]},\n", " 'criterion': ['gini', 'entropy'], \n", " 'splitter': ['best', 'random'],\n", @@ -431,7 +431,7 @@ "scores = ['precision', 'recall']\n", "\n", "for score in scores:\n", - " print(\"# Tuning hyper-parameters for %s\" % score)\n", + " print(\"# Tuning hyper-hyperparameters for %s\" % score)\n", " print()\n", "\n", " if score == 'precision':\n", @@ -440,10 +440,10 @@ " scorer = make_scorer(recall_score, average='weighted', zero_division=0)\n", " \n", " # cv = the fold of the cross-validation cv, defaulted to 5\n", - " gs = GridSearchCV(DecisionTreeClassifier(), tuned_parameters, cv=10, scoring=scorer)\n", + " gs = GridSearchCV(DecisionTreeClassifier(), tuned_hyperparameters, cv=10, scoring=scorer)\n", " gs.fit(x_train, y_train)\n", "\n", - " print(\"Best parameters set found on development set:\")\n", + " print(\"Best hyperparameters set found on development set:\")\n", " print()\n", " print(gs.best_params_)\n", " print()\n", @@ -520,7 +520,7 @@ "* [Plot the decision surface of a decision tree on the iris dataset](https://scikit-learn.org/stable/auto_examples/tree/plot_iris_dtc.html)\n", "* [scikit-learn : Machine Learning Simplified](https://learning.oreilly.com/library/view/scikit-learn-machine/9781788833479/), Raúl Garreta; Guillermo Moncecchi, Packt Publishing, 2017.\n", "* [Python Machine Learning](https://learning.oreilly.com/library/view/python-machine-learning/9781789955750/), Sebastian Raschka, Packt Publishing, 2019.\n", - "* [Parameter estimation using grid search with cross-validation](http://scikit-learn.org/stable/auto_examples/model_selection/grid_search_digits.html)\n", + "* [Hyperparameter estimation using grid search with cross-validation](http://scikit-learn.org/stable/auto_examples/model_selection/grid_search_digits.html)\n", "* [Decision trees in python with scikit-learn and pandas](http://chrisstrelioff.ws/sandbox/2015/06/08/decision_trees_in_python_with_scikit_learn_and_pandas.html)" ] },