Review J

2026-02-09 08:08:17 +00:00 · 2016-03-28 12:26:20 +02:00
parent 65d1dc162f
commit 62f4fce1ed
12 changed files with 816 additions and 773 deletions
--- a/ml1/2_5_0_Machine_Learning.ipynb
+++ b/ml1/2_5_0_Machine_Learning.ipynb
@@ -36,8 +36,8 @@
    "\n",
    "* [Machine Learning](#Machine-Learning)\n",
    "* [Machine learning algorithms](#Machine-learning-algorithms)\n",
-    "\t\t* [Supervised machine learning model](#Supervised-machine-learning-model)\n",
-    "\t\t* [Unsupervised machine learning model](#Unsupervised-machine-learning-model)\n",
+    "    * [Supervised machine learning model](#Supervised-machine-learning-model)\n",
+    "\t* [Unsupervised machine learning model](#Unsupervised-machine-learning-model)\n",
    "* [sklearn interface](#sklearn-interface)\n",
    "* [References](#References)"
   ]
@@ -53,7 +53,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This is an introduction of general ideas about machine learning and the general interface of scikit-learn, taken from the [scikit-learn tutorial](http://www.astroml.org/sklearn_tutorial/general_concepts.html). \n",
+    "This is an introduction of general ideas about machine learning and the interface of scikit-learn, taken from the [scikit-learn tutorial](http://www.astroml.org/sklearn_tutorial/general_concepts.html). \n",
    "\n",
    "You can skip it during the lab session and read it later,"
   ]
@@ -75,7 +75,7 @@
    "* **n_samples**: number of samples. Each sample  is an item to process (i.e. classify). A sample can be a document, a picture, a sound, a video, a row in database or CSV file, or whatever you can describe with a fixed set of quantitative traits.\n",
    "* **n_features**: The number of features or distinct traits that can be used to describe each item in a quantitative manner.\n",
    "\n",
-    "The number of features should be defined in advanced and it can be very high dimensional (e.g. millions of features) with most of them being zeros for a given sample. In this case we may use (scipy.sparse) sparse matrices instead of (numpy) arrays so as to make the data fit in memory.\n",
+    "The number of features should be defined in advance. There is a specific type of feature sets that are high dimensional (e.g. millions of features), but most of the values are zero for a given sample. Using (numpy) arrays, all those values that are zero would also take up memory. For this reason, these feature sets are often represented with sparse matrices (scipy.sparse) instead of (numpy) arrays.\n",
    "\n",
    "The first step in machine learning is **identifying the relevant features** from the input data, and the second step is **extracting the features** from the input data. \n",
    "\n",
@@ -193,7 +193,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.1"
+   "version": "3.5.1+"
  }
 },
 "nbformat": 4,