{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "![](images/EscUpmPolit_p.gif \"UPM\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "# Course Notes for Learning Intelligent Systems" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Datacleaner\n", "[Datacleaner](https://github.com/rhiever/datacleaner) supports:\n", "\n", "* drop rows with missing values\n", "* replace missing values with the mode or median on a column-by-column basis\n", "* encode non-numeric variables with numerical equivalents\n", "\n", "\n", "Install with\n", "\n", "**pip install datacleaner**" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | PassengerId | \n", "Survived | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Ticket | \n", "Fare | \n", "Cabin | \n", "Embarked | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "0 | \n", "3 | \n", "Braund, Mr. Owen Harris | \n", "male | \n", "22.0 | \n", "1 | \n", "0 | \n", "A/5 21171 | \n", "7.2500 | \n", "NaN | \n", "S | \n", "
1 | \n", "2 | \n", "1 | \n", "1 | \n", "Cumings, Mrs. John Bradley (Florence Briggs Th... | \n", "female | \n", "38.0 | \n", "1 | \n", "0 | \n", "PC 17599 | \n", "71.2833 | \n", "C85 | \n", "C | \n", "
2 | \n", "3 | \n", "1 | \n", "3 | \n", "Heikkinen, Miss. Laina | \n", "female | \n", "26.0 | \n", "0 | \n", "0 | \n", "STON/O2. 3101282 | \n", "7.9250 | \n", "NaN | \n", "S | \n", "
3 | \n", "4 | \n", "1 | \n", "1 | \n", "Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n", "female | \n", "35.0 | \n", "1 | \n", "0 | \n", "113803 | \n", "53.1000 | \n", "C123 | \n", "S | \n", "
4 | \n", "5 | \n", "0 | \n", "3 | \n", "Allen, Mr. William Henry | \n", "male | \n", "35.0 | \n", "0 | \n", "0 | \n", "373450 | \n", "8.0500 | \n", "NaN | \n", "S | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
886 | \n", "887 | \n", "0 | \n", "2 | \n", "Montvila, Rev. Juozas | \n", "male | \n", "27.0 | \n", "0 | \n", "0 | \n", "211536 | \n", "13.0000 | \n", "NaN | \n", "S | \n", "
887 | \n", "888 | \n", "1 | \n", "1 | \n", "Graham, Miss. Margaret Edith | \n", "female | \n", "19.0 | \n", "0 | \n", "0 | \n", "112053 | \n", "30.0000 | \n", "B42 | \n", "S | \n", "
888 | \n", "889 | \n", "0 | \n", "3 | \n", "Johnston, Miss. Catherine Helen \"Carrie\" | \n", "female | \n", "NaN | \n", "1 | \n", "2 | \n", "W./C. 6607 | \n", "23.4500 | \n", "NaN | \n", "S | \n", "
889 | \n", "890 | \n", "1 | \n", "1 | \n", "Behr, Mr. Karl Howell | \n", "male | \n", "26.0 | \n", "0 | \n", "0 | \n", "111369 | \n", "30.0000 | \n", "C148 | \n", "C | \n", "
890 | \n", "891 | \n", "0 | \n", "3 | \n", "Dooley, Mr. Patrick | \n", "male | \n", "32.0 | \n", "0 | \n", "0 | \n", "370376 | \n", "7.7500 | \n", "NaN | \n", "Q | \n", "
891 rows × 12 columns
\n", "\n", " | PassengerId | \n", "Survived | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "SibSp | \n", "Parch | \n", "Ticket | \n", "Fare | \n", "Cabin | \n", "Embarked | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "0 | \n", "3 | \n", "108 | \n", "1 | \n", "22.0 | \n", "1 | \n", "0 | \n", "523 | \n", "7.2500 | \n", "47 | \n", "2 | \n", "
1 | \n", "2 | \n", "1 | \n", "1 | \n", "190 | \n", "0 | \n", "38.0 | \n", "1 | \n", "0 | \n", "596 | \n", "71.2833 | \n", "81 | \n", "0 | \n", "
2 | \n", "3 | \n", "1 | \n", "3 | \n", "353 | \n", "0 | \n", "26.0 | \n", "0 | \n", "0 | \n", "669 | \n", "7.9250 | \n", "47 | \n", "2 | \n", "
3 | \n", "4 | \n", "1 | \n", "1 | \n", "272 | \n", "0 | \n", "35.0 | \n", "1 | \n", "0 | \n", "49 | \n", "53.1000 | \n", "55 | \n", "2 | \n", "
4 | \n", "5 | \n", "0 | \n", "3 | \n", "15 | \n", "1 | \n", "35.0 | \n", "0 | \n", "0 | \n", "472 | \n", "8.0500 | \n", "47 | \n", "2 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
886 | \n", "887 | \n", "0 | \n", "2 | \n", "548 | \n", "1 | \n", "27.0 | \n", "0 | \n", "0 | \n", "101 | \n", "13.0000 | \n", "47 | \n", "2 | \n", "
887 | \n", "888 | \n", "1 | \n", "1 | \n", "303 | \n", "0 | \n", "19.0 | \n", "0 | \n", "0 | \n", "14 | \n", "30.0000 | \n", "30 | \n", "2 | \n", "
888 | \n", "889 | \n", "0 | \n", "3 | \n", "413 | \n", "0 | \n", "28.0 | \n", "1 | \n", "2 | \n", "675 | \n", "23.4500 | \n", "47 | \n", "2 | \n", "
889 | \n", "890 | \n", "1 | \n", "1 | \n", "81 | \n", "1 | \n", "26.0 | \n", "0 | \n", "0 | \n", "8 | \n", "30.0000 | \n", "60 | \n", "0 | \n", "
890 | \n", "891 | \n", "0 | \n", "3 | \n", "220 | \n", "1 | \n", "32.0 | \n", "0 | \n", "0 | \n", "466 | \n", "7.7500 | \n", "47 | \n", "1 | \n", "
891 rows × 12 columns
\n", "