mirror of
https://github.com/gsi-upm/sitc
synced 2025-08-24 10:32:20 +00:00
Added preprocessing notebooks
This commit is contained in:
committed by
GitHub
parent
1a3f618995
commit
86114b4a56
198
ml21/preprocessing/07_Binarize_Data.ipynb
Normal file
198
ml21/preprocessing/07_Binarize_Data.ipynb
Normal file
@@ -0,0 +1,198 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "skip"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "skip"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# Course Notes for Learning Intelligent Systems"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "skip"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "skip"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "slide"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# Binarize Data\n",
|
||||
"* We can transform our data using a binary threshold. All values above the threshold are marked 1, and all values equal to or below are marked 0.\n",
|
||||
"* This is called binarizing your data or thresholding your data. \n",
|
||||
"\n",
|
||||
"* It can be helpful when you have probabilities that you want to make crisp values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "slide"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Binarize Data with Scikit-Learn\n",
|
||||
"We can create new binary attributes in Python using Scikit-learn with the Binarizer class.\n",
|
||||
"I"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "fragment"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.preprocessing import Binarizer\n",
|
||||
"\n",
|
||||
"X = [[ 1., -1., 2.],\n",
|
||||
" [ 2., 0., 0.],\n",
|
||||
" [ 0., 1.1, -1.]]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "fragment"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"transformer = Binarizer(threshold=1.0).fit(X) # threshold 1.0"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "fragment"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"array([[0., 0., 1.],\n",
|
||||
" [1., 0., 0.],\n",
|
||||
" [0., 1., 0.]])"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"transformer.transform(X)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "skip"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# References\n",
|
||||
"* [Cleaning and Prepping Data with Python for Data Science — Best Practices and Helpful Packages](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3), DeFilippi, 2019, \n",
|
||||
"* [Binarizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Binarizer.html), Scikit Learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"slideshow": {
|
||||
"slide_type": "skip"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Licence\n",
|
||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||
"\n",
|
||||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"celltoolbar": "Slideshow",
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.13"
|
||||
},
|
||||
"latex_envs": {
|
||||
"LaTeX_envs_menu_present": true,
|
||||
"autocomplete": true,
|
||||
"bibliofile": "biblio.bib",
|
||||
"cite_by": "apalike",
|
||||
"current_citInitial": 1,
|
||||
"eqLabelWithNumbers": true,
|
||||
"eqNumInitial": 1,
|
||||
"hotkeys": {
|
||||
"equation": "Ctrl-E",
|
||||
"itemize": "Ctrl-I"
|
||||
},
|
||||
"labels_anchors": false,
|
||||
"latex_user_defs": false,
|
||||
"report_style_numbering": false,
|
||||
"user_envs_cfg": false
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
Reference in New Issue
Block a user