1
0
mirror of https://github.com/gsi-upm/sitc synced 2025-06-13 11:42:21 +00:00

Update 2_5_2_Decision_Tree_Model.ipynb

Changed image path
This commit is contained in:
Carlos A. Iglesias 2025-06-02 17:13:49 +03:00 committed by GitHub
parent 7b4d16964d
commit 21e7ae2f57
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
"![](./images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
@ -56,9 +56,9 @@
"source": [
"The goal of this notebook is to learn how to create a classification object using a [decision tree learning algorithm](https://en.wikipedia.org/wiki/Decision_tree_learning). \n",
"\n",
"There are a number of well known machine learning algorithms for decision tree learning, such as ID3, C4.5, C5.0 and CART. The scikit-learn uses an optimised version of the [CART (Classification and Regression Trees) algorithm](https://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees).\n",
"There are several well-known machine learning algorithms for decision tree learning, such as ID3, C4.5, C5.0, and CART. The scikit-learn uses an optimised version of the [CART (Classification and Regression Trees) algorithm](https://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees).\n",
"\n",
"This notebook will follow the same steps that the previous notebook for learning using the [kNN Model](2_5_1_kNN_Model.ipynb), and details some peculiarities of the decision tree algorithms.\n",
"This notebook will follow the same steps as the previous notebook for learning using the [kNN Model](2_5_1_kNN_Model.ipynb), and details some peculiarities of the decision tree algorithms.\n",
"\n",
"You need to install pydotplus: `conda install pydotplus` for the visualization."
]
@ -69,7 +69,7 @@
"source": [
"## Load data and preprocessing\n",
"\n",
"Here we repeat the same operations for loading data and preprocessing than in the previous notebooks."
"Here we repeat the same operations for loading data and preprocessing as in the previous notebooks."
]
},
{
@ -262,8 +262,8 @@
"The current version of pydot does not work well in Python 3.\n",
"For obtaining an image, you need to install `pip install pydotplus` and then `conda install graphviz`.\n",
"\n",
"You can skip this example. Since it can require installing additional packages, we include here the result.\n",
"![Decision Tree](files/images/cart.png)"
"You can skip this example. Since it can require installing additional packages, we have included the result here.\n",
"![Decision Tree](./images/cart.png)"
]
},
{
@ -330,7 +330,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Next we are going to export the pseudocode of the the learnt decision tree."
"Next, we will export the pseudocode of the learnt decision tree."
]
},
{
@ -378,14 +378,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Precision, recall and f-score"
"### Precision, recall, and f-score"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For evaluating classification algorithms, we usually calculate three metrics: precision, recall and F1-score\n",
"For evaluating classification algorithms, we usually calculate three metrics: precision, recall, and F1-score\n",
"\n",
"* **Precision**: This computes the proportion of instances predicted as positives that were correctly evaluated (it measures how right our classifier is when it says that an instance is positive).\n",
"* **Recall**: This counts the proportion of positive instances that were correctly evaluated (measuring how right our classifier is when faced with a positive instance).\n",
@ -412,7 +412,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Another useful metric is the confusion matrix"
"Another useful metric is the confusion matrix."
]
},
{
@ -428,7 +428,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We see we classify well all the 'setosa' and 'versicolor' samples. "
"We classify all the 'setosa' and 'versicolor' samples well. "
]
},
{
@ -442,7 +442,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**.\n",
"To avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**.\n",
"\n",
"Sklearn comes with other strategies for [cross validation](http://scikit-learn.org/stable/modules/cross_validation.html#cross-validation), such as stratified K-fold, label k-fold, Leave-One-Out, Leave-P-Out, Leave-One-Label-Out, Leave-P-Label-Out or Shuffle & Split."
]
@ -466,7 +466,7 @@
"# create a k-fold cross validation iterator of k=10 folds\n",
"cv = KFold(10, shuffle=True, random_state=33)\n",
"\n",
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
"print(scores)"
]
@ -475,7 +475,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure"
"We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure."
]
},
{
@ -518,7 +518,7 @@
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]