mirror of
https://github.com/gsi-upm/sitc
synced 2025-06-13 11:42:21 +00:00
Update 2_5_2_Decision_Tree_Model.ipynb
Changed image path
This commit is contained in:
parent
7b4d16964d
commit
21e7ae2f57
@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -56,9 +56,9 @@
|
||||
"source": [
|
||||
"The goal of this notebook is to learn how to create a classification object using a [decision tree learning algorithm](https://en.wikipedia.org/wiki/Decision_tree_learning). \n",
|
||||
"\n",
|
||||
"There are a number of well known machine learning algorithms for decision tree learning, such as ID3, C4.5, C5.0 and CART. The scikit-learn uses an optimised version of the [CART (Classification and Regression Trees) algorithm](https://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees).\n",
|
||||
"There are several well-known machine learning algorithms for decision tree learning, such as ID3, C4.5, C5.0, and CART. The scikit-learn uses an optimised version of the [CART (Classification and Regression Trees) algorithm](https://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees).\n",
|
||||
"\n",
|
||||
"This notebook will follow the same steps that the previous notebook for learning using the [kNN Model](2_5_1_kNN_Model.ipynb), and details some peculiarities of the decision tree algorithms.\n",
|
||||
"This notebook will follow the same steps as the previous notebook for learning using the [kNN Model](2_5_1_kNN_Model.ipynb), and details some peculiarities of the decision tree algorithms.\n",
|
||||
"\n",
|
||||
"You need to install pydotplus: `conda install pydotplus` for the visualization."
|
||||
]
|
||||
@ -69,7 +69,7 @@
|
||||
"source": [
|
||||
"## Load data and preprocessing\n",
|
||||
"\n",
|
||||
"Here we repeat the same operations for loading data and preprocessing than in the previous notebooks."
|
||||
"Here we repeat the same operations for loading data and preprocessing as in the previous notebooks."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -262,8 +262,8 @@
|
||||
"The current version of pydot does not work well in Python 3.\n",
|
||||
"For obtaining an image, you need to install `pip install pydotplus` and then `conda install graphviz`.\n",
|
||||
"\n",
|
||||
"You can skip this example. Since it can require installing additional packages, we include here the result.\n",
|
||||
""
|
||||
"You can skip this example. Since it can require installing additional packages, we have included the result here.\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -330,7 +330,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next we are going to export the pseudocode of the the learnt decision tree."
|
||||
"Next, we will export the pseudocode of the learnt decision tree."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -378,14 +378,14 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Precision, recall and f-score"
|
||||
"### Precision, recall, and f-score"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For evaluating classification algorithms, we usually calculate three metrics: precision, recall and F1-score\n",
|
||||
"For evaluating classification algorithms, we usually calculate three metrics: precision, recall, and F1-score\n",
|
||||
"\n",
|
||||
"* **Precision**: This computes the proportion of instances predicted as positives that were correctly evaluated (it measures how right our classifier is when it says that an instance is positive).\n",
|
||||
"* **Recall**: This counts the proportion of positive instances that were correctly evaluated (measuring how right our classifier is when faced with a positive instance).\n",
|
||||
@ -412,7 +412,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Another useful metric is the confusion matrix"
|
||||
"Another useful metric is the confusion matrix."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -428,7 +428,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We see we classify well all the 'setosa' and 'versicolor' samples. "
|
||||
"We classify all the 'setosa' and 'versicolor' samples well. "
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -442,7 +442,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In order to avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**.\n",
|
||||
"To avoid bias in the training and testing dataset partition, it is recommended to use **k-fold validation**.\n",
|
||||
"\n",
|
||||
"Sklearn comes with other strategies for [cross validation](http://scikit-learn.org/stable/modules/cross_validation.html#cross-validation), such as stratified K-fold, label k-fold, Leave-One-Out, Leave-P-Out, Leave-One-Label-Out, Leave-P-Label-Out or Shuffle & Split."
|
||||
]
|
||||
@ -466,7 +466,7 @@
|
||||
"# create a k-fold cross validation iterator of k=10 folds\n",
|
||||
"cv = KFold(10, shuffle=True, random_state=33)\n",
|
||||
"\n",
|
||||
"# by default the score used is the one returned by score method of the estimator (accuracy)\n",
|
||||
"# by default the score used is the one returned by the score method of the estimator (accuracy)\n",
|
||||
"scores = cross_val_score(model, x_iris, y_iris, cv=cv)\n",
|
||||
"print(scores)"
|
||||
]
|
||||
@ -475,7 +475,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure"
|
||||
"We get an array of k scores. We can calculate the mean and the standard error to obtain a final figure."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -518,7 +518,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Licence\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
|
Loading…
x
Reference in New Issue
Block a user