diff --git a/ml1/2_3_0_Visualisation.ipynb b/ml1/2_3_0_Visualisation.ipynb index 645b394..4a4f435 100644 --- a/ml1/2_3_0_Visualisation.ipynb +++ b/ml1/2_3_0_Visualisation.ipynb @@ -65,9 +65,10 @@ "source": [ "This section covers different ways to inspect the distribution of samples per feature.\n", "\n", - "First of all, let's take a see how many samples of each class we have, using a [histogram](https://en.wikipedia.org/wiki/Histogram). \n", + "First of all, let's see how many samples of each class we have, using a [histogram](https://en.wikipedia.org/wiki/Histogram). \n", + "\n", - "A histogram is a graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (quantitative variable). \n", + "A histogram is a graphical representation of the distribution of numerical data. It is an estimation of the probability distribution of a continuous variable (quantitative variable). \n", "\n", "For building a histogram, we need first to 'bin' the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. \n", "\n", @@ -151,7 +152,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We see we have the same distribution of samples for every class.\n", + "As can be seen, we have the same distribution of samples for every class.\n", "The next step is to see the distribution of the features" ] }, diff --git a/ml1/2_4_Preprocessing.ipynb b/ml1/2_4_Preprocessing.ipynb index 49d8619..f386904 100644 --- a/ml1/2_4_Preprocessing.ipynb +++ b/ml1/2_4_Preprocessing.ipynb @@ -50,7 +50,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The goal of this notebook is to learn how separate the dataset into training and test datasets and then preprocess the data." + "The goal of this notebook is to learn how to split the dataset into a training and a test datasets and then preprocess the data." ] }, { @@ -78,7 +78,7 @@ "source": [ "A common practice in machine learning to evaluate an algorithm is to split the data at hand into two sets, one that we call the **training set** on which we learn data properties and one that we call the **testing set** on which we test these properties. \n", "\n", - "We are going to use *scikit-learn* to split the data into random training and testing sets. We follow the ration 75% for training and 25% for testing. We use `random_state` to ensure that the result is always the same and it is reproducible. (Otherwise, we would get different training and testing sets every time)." + "We are going to use *scikit-learn* to split the data into random training and testing sets. We follow the ratio 75% for training and 25% for testing. We use `random_state` to ensure that the result is always the same and it is reproducible. (Otherwise, we would get different training and testing sets every time)." ] }, { diff --git a/ml1/2_5_0_Machine_Learning.ipynb b/ml1/2_5_0_Machine_Learning.ipynb index bf6de89..97dbd2b 100644 --- a/ml1/2_5_0_Machine_Learning.ipynb +++ b/ml1/2_5_0_Machine_Learning.ipynb @@ -126,7 +126,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "scikit-learn has a uniform interface for all the estimators, some methods are only available is the estimator is supervised or unsupervised:\n", + "scikit-learn has a uniform interface for all the estimators, some methods are only available if the estimator is supervised or unsupervised:\n", "\n", "* Available in *all estimators*:\n", " * **model.fit()**: fit training data. For supervised learning applications, this accepts two arguments: the data X and the labels y (e.g. model.fit(X, y)). For unsupervised learning applications, this accepts only a single argument, the data X (e.g. model.fit(X)).\n", diff --git a/ml1/2_5_2_Decision_Tree_Model.ipynb b/ml1/2_5_2_Decision_Tree_Model.ipynb index dee2fdb..a0246a8 100644 --- a/ml1/2_5_2_Decision_Tree_Model.ipynb +++ b/ml1/2_5_2_Decision_Tree_Model.ipynb @@ -54,7 +54,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The goal of this notebook is to learn how to learn how create a classification object using a [decision tree learning algorithm](https://en.wikipedia.org/wiki/Decision_tree_learning). \n", + "The goal of this notebook is to learn how to create a classification object using a [decision tree learning algorithm](https://en.wikipedia.org/wiki/Decision_tree_learning). \n", "\n", "There are a number of well known machine learning algorithms for decision tree learning, such as ID3, C4.5, C5.0 and CART. The scikit-learn uses an optimised version of the [CART (Classification and Regression Trees) algorithm](https://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees).\n", "\n",