"![](images/EscUpmPolit_p.gif \"UPM\")"
# Course Notes for Learning Intelligent Systems
Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias
## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)
# Duplicated values
There are two possible approaches: **remove** these rows or **filling** them. It depends on every case.
"import pandas as pd\n",
"import numpy as np"
"## Filling NaN values\n",
"If we need to fill errors or blanks, we can use the methods **fillna()** or **dropna()**.\n",
"* For **string** fields, we can fill NaN with **' '**.\n",
"* For **numbers**, we can fill with the **mean** or **median** value. \n"
"# Fill NaN with ' '\n",
"df['col'] = df['col'].fillna(' ')\n",
"# Fill NaN with 99\n",
"df['col'] = df['col'].fillna(99)\n",
"# Fill NaN with the mean of the column\n",
"df['col'] = df['col'].fillna(df['col'].mean())"
"## Propagate non-null values forward or backwards\n",
"You can also propagate non-null values forward or backwards by putting\n",
"method=pad as the method argument. It will fill the next value in the\n",
"dataframe with the previous non-NaN value. Maybe you just want to fill one\n",
"value ( limit=1 )or you want to fill all the values."
df = pd.DataFrame(data={'col1':[np.nan, np.nan, 2,3,4, np.nan, np.nan]})
# We fill forward the value 4.0 and fill the next one (limit = 1)
df.fillna(method='pad', limit=1)
"df.fillna(method='pad', limit=1)"
We can also backfilling with **bfill**.
# Fill the first two NaN values with the first available value
df.fillna(method='bfill')
"## Removing NaN values\n",
"We can remove them by row or column."
"/# Drop any rows which have any nans\n",
"/# Drop columns that have any nans\n",
"/# Only drop columns which have at least 90% non-NaNs\n",
"df.dropna(thresh=int(df.shape[0] * .9), axis=1)"
"# References\n",
"* [Cleaning and Prepping Data with Python for Data Science — Best Practices and Helpful Packages](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3), DeFilippi, 2019, \n",
"* [Data Preprocessing for Machine learning in Python, GeeksForGeeks](https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/)"
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
