diff --git a/ml1/2_5_1_kNN_Model.ipynb b/ml1/2_5_1_kNN_Model.ipynb index 15eeefd..7dbb048 100644 --- a/ml1/2_5_1_kNN_Model.ipynb +++ b/ml1/2_5_1_kNN_Model.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "![](files/images/EscUpmPolit_p.gif \"UPM\")" + "![](./images/EscUpmPolit_p.gif \"UPM\")" ] }, { @@ -55,7 +55,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The goal of this notebook is to learn how to train a model, make predictions with that model and evaluate these predictions.\n", + "The goal of this notebook is to learn how to train a model, make predictions with that model, and evaluate these predictions.\n", "\n", "The notebook uses the [kNN (k nearest neighbors) algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)." ] @@ -212,14 +212,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Precision, recall and f-score" + "### Precision, recall, and f-score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "For evaluating classification algorithms, we usually calculate three metrics: precision, recall and F1-score\n", + "For evaluating classification algorithms, we usually calculate three metrics: precision, recall, and F1-score\n", "\n", "* **Precision**: This computes the proportion of instances predicted as positives that were correctly evaluated (it measures how right our classifier is when it says that an instance is positive).\n", "* **Recall**: This counts the proportion of positive instances that were correctly evaluated (measuring how right our classifier is when faced with a positive instance).\n", @@ -246,7 +246,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Another useful metric is the confusion matrix" + "Another useful metric is the confusion matrix." ] }, { @@ -262,7 +262,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We see we classify well all the 'setosa' and 'versicolor' samples. " + "We classify all the 'setosa' and 'versicolor' samples well. " ] }, { @@ -276,7 +276,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In order to avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**." + "To avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**." ] }, { @@ -298,7 +298,7 @@ "# create a k-fold cross validation iterator of k=10 folds\n", "cv = KFold(10, shuffle=True, random_state=33)\n", "\n", - "# by default the score used is the one returned by score method of the estimator (accuracy)\n", + "# by default the score used is the one returned by the score method of the estimator (accuracy)\n", "scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n", "print(scores)" ] @@ -307,7 +307,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure" + "We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure." ] }, { @@ -340,7 +340,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We are going to tune the algorithm, and calculate which is the best value for the k hyperparameter." + "We will tune the algorithm and calculate the best value for the k hyperparameter." ] }, { @@ -365,7 +365,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The result is very dependent of the input data. Execute again the train_test_split and test again how the result changes with k." + "The result is very dependent on the input data. Execute the train_test_split again and test how the result changes with k." ] }, { @@ -387,7 +387,7 @@ "metadata": {}, "source": [ "## Licence\n", - "The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n", + "The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n", "\n", "© Carlos A. Iglesias, Universidad Politécnica de Madrid." ]