Updated to Pandas 3.X and corrected typos

2026-03-03 02:08:17 +00:00 · 2026-03-02 17:40:58 +01:00
parent 5c440527ac
commit 65da5ae714
8 changed files with 105 additions and 135 deletions
--- a/ml2/3_6_Machine_Learning.ipynb
+++ b/ml2/3_6_Machine_Learning.ipynb
@@ -41,22 +41,22 @@
   "source": [
    "In the previous session, we learnt how to apply machine learning algorithms to the Iris dataset.\n",
    "\n",
-    "We are going now to review the full process. As probably you have notice, data preparation, cleaning and transformation takes more than 90 % of data mining effort.\n",
+    "We are going to review the full process now. As you probably have noticed, data preparation, cleaning, and transformation account for more than 90% of the data mining effort.\n",
    "\n",
    "The phases are:\n",
    "\n",
    "* **Data ingestion**: reading the data from the data lake\n",
    "* **Preprocessing**: \n",
-    "    * **Data cleaning (munging)**:  fill missing values, smooth noisy data (binning methods), identify or remove outlier, and resolve inconsistencies \n",
+    "    * **Data cleaning (munging)**:  fill missing values, smooth noisy data (binning methods), identify or remove outliers, and resolve inconsistencies \n",
    "    * **Data integration**: Integrate multiple datasets\n",
-    "    * **Data transformation**: normalization (rescale numeric values between 0 and 1), standardisation (rescale values to have mean of 0 and std of 1), transformation for smoothing a variable (e.g. square toot, ...), aggregation of data from several datasets\n",
-    "    * **Data reduction**: dimensionality reduction, clustering and sampling. \n",
+    "    * **Data transformation**: normalization (rescale numeric values between 0 and 1), standardisation (rescale values to have a mean of 0 and std of 1), transformation for smoothing a variable (e.g., square root, ...), aggregation of data from several datasets\n",
+    "    * **Data reduction**: dimensionality reduction, clustering, and sampling. \n",
    "    * **Data discretization**: for numerical values and algorithms that do not accept continuous variables\n",
-    "    * **Feature engineering**: selection of most relevant features, creation of new features and delete non relevant features\n",
+    "    * **Feature engineering**: selection of the most relevant features, creation of new features, and deletion of non-relevant features\n",
    "    * Apply  Sampling for dividing the dataset into training and test datasets.\n",
    "* **Machine learning**: apply machine learning algorithms and obtain an estimator, tuning its parameters.\n",
    "* **Evaluation** of the model\n",
-    "* **Prediction**: use the model for new data."
+    "* **Prediction**: Use the model for new data."
   ]
  },
  {
@@ -92,7 +92,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  \n",
+    "The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  \n",
    "\n",
    "© Carlos A. Iglesias, Universidad Politécnica de Madrid."
   ]
@@ -114,7 +114,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.8.12"
+   "version": "3.12.2"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
@@ -135,5 +135,5 @@
  }
 },
 "nbformat": 4,
- "nbformat_minor": 1
+ "nbformat_minor": 4
 }