mirror of
https://github.com/gsi-upm/sitc
synced 2024-11-16 19:42:28 +00:00
199 lines
4.3 KiB
Plaintext
199 lines
4.3 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"![](images/EscUpmPolit_p.gif \"UPM\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"# Course Notes for Learning Intelligent Systems"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"# Binarize Data\n",
|
||
|
"* We can transform our data using a binary threshold. All values above the threshold are marked 1, and all values equal to or below are marked 0.\n",
|
||
|
"* This is called binarizing your data or thresholding your data. \n",
|
||
|
"\n",
|
||
|
"* It can be helpful when you have probabilities that you want to make crisp values."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "slide"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## Binarize Data with Scikit-Learn\n",
|
||
|
"We can create new binary attributes in Python using Scikit-learn with the Binarizer class.\n",
|
||
|
"I"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from sklearn.preprocessing import Binarizer\n",
|
||
|
"\n",
|
||
|
"X = [[ 1., -1., 2.],\n",
|
||
|
" [ 2., 0., 0.],\n",
|
||
|
" [ 0., 1.1, -1.]]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 5,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"transformer = Binarizer(threshold=1.0).fit(X) # threshold 1.0"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 6,
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "fragment"
|
||
|
}
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([[0., 0., 1.],\n",
|
||
|
" [1., 0., 0.],\n",
|
||
|
" [0., 1., 0.]])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 6,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"transformer.transform(X)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"# References\n",
|
||
|
"* [Cleaning and Prepping Data with Python for Data Science — Best Practices and Helpful Packages](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3), DeFilippi, 2019, \n",
|
||
|
"* [Binarizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Binarizer.html), Scikit Learn"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"slideshow": {
|
||
|
"slide_type": "skip"
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"## Licence\n",
|
||
|
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||
|
"\n",
|
||
|
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"celltoolbar": "Slideshow",
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3 (ipykernel)",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.10.13"
|
||
|
},
|
||
|
"latex_envs": {
|
||
|
"LaTeX_envs_menu_present": true,
|
||
|
"autocomplete": true,
|
||
|
"bibliofile": "biblio.bib",
|
||
|
"cite_by": "apalike",
|
||
|
"current_citInitial": 1,
|
||
|
"eqLabelWithNumbers": true,
|
||
|
"eqNumInitial": 1,
|
||
|
"hotkeys": {
|
||
|
"equation": "Ctrl-E",
|
||
|
"itemize": "Ctrl-I"
|
||
|
},
|
||
|
"labels_anchors": false,
|
||
|
"latex_user_defs": false,
|
||
|
"report_style_numbering": false,
|
||
|
"user_envs_cfg": false
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 4
|
||
|
}
|