Merge branch 'master' of https://github.com/gsi-upm/sitc

2025-12-30 23:58:15 +00:00 · 2018-02-15 18:53:59 +01:00
parent 62219c6404 92bde106fc
commit 31603bda6a
4 changed files with 8 additions and 7 deletions
--- a/ml1/2_3_0_Visualisation.ipynb
+++ b/ml1/2_3_0_Visualisation.ipynb
@@ -65,9 +65,10 @@
   "source": [
    "This section covers different ways to inspect the distribution of samples per feature.\n",
    "\n",
-    "First of all, let's take a see how many samples of each class we have, using a [histogram](https://en.wikipedia.org/wiki/Histogram). \n",
+    "First of all, let's see how many samples of each class we have, using a [histogram](https://en.wikipedia.org/wiki/Histogram). \n",
+
    "\n",
-    "A histogram is a graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (quantitative variable). \n",
+    "A histogram is a graphical representation of the distribution of numerical data. It is an estimation of the probability distribution of a continuous variable (quantitative variable). \n",
    "\n",
    "For building a histogram, we need first to 'bin' the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. \n",
    "\n",
@@ -151,7 +152,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We see we have the same distribution of samples for every class.\n",
+    "As can be seen, we have the same distribution of samples for every class.\n",
    "The next step is to see the distribution of the features"
   ]
  },
--- a/ml1/2_4_Preprocessing.ipynb
+++ b/ml1/2_4_Preprocessing.ipynb
@@ -50,7 +50,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The goal of this notebook is to learn how separate the dataset into training and test datasets and then preprocess the data."
+    "The goal of this notebook is to learn how to split the dataset into a training and a test datasets and then preprocess the data."
   ]
  },
  {
@@ -78,7 +78,7 @@
   "source": [
    "A common practice in machine learning to evaluate an algorithm is to split the data at hand into two sets, one that we call the **training set** on which we learn data properties and one that we call the **testing set** on which we test these properties. \n",
    "\n",
-    "We are going to use *scikit-learn* to split the data into random training and testing sets. We follow the ration 75% for training and 25% for testing. We use `random_state` to ensure that the result is always the same and it is reproducible. (Otherwise, we would get different training and testing sets every time)."
+    "We are going to use *scikit-learn* to split the data into random training and testing sets. We follow the ratio 75% for training and 25% for testing. We use `random_state` to ensure that the result is always the same and it is reproducible. (Otherwise, we would get different training and testing sets every time)."
   ]
  },
  {
--- a/ml1/2_5_0_Machine_Learning.ipynb
+++ b/ml1/2_5_0_Machine_Learning.ipynb
@@ -126,7 +126,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "scikit-learn has a uniform interface for all the estimators, some methods are only available is the estimator is supervised or unsupervised:\n",
+    "scikit-learn has a uniform interface for all the estimators, some methods are only available if the estimator is supervised or unsupervised:\n",
    "\n",
    "* Available in *all estimators*:\n",
    "    * **model.fit()**: fit training data. For supervised learning applications, this accepts two arguments: the data X and the labels y (e.g. model.fit(X, y)). For unsupervised learning applications, this accepts only a single argument, the data X (e.g. model.fit(X)).\n",
--- a/ml1/2_5_2_Decision_Tree_Model.ipynb
+++ b/ml1/2_5_2_Decision_Tree_Model.ipynb
@@ -54,7 +54,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The goal of this notebook is to learn how to learn how create a classification object using a [decision tree learning algorithm](https://en.wikipedia.org/wiki/Decision_tree_learning). \n",
+    "The goal of this notebook is to learn how to create a classification object using a [decision tree learning algorithm](https://en.wikipedia.org/wiki/Decision_tree_learning). \n",
    "\n",
    "There are a number of well known machine learning algorithms for decision tree learning, such as ID3, C4.5, C5.0 and CART. The scikit-learn uses an optimised version of the [CART (Classification and Regression Trees) algorithm](https://en.wikipedia.org/wiki/Predictive_analytics#Classification_and_regression_trees).\n",
    "\n",