1
0
mirror of https://github.com/gsi-upm/sitc synced 2024-11-18 04:22:28 +00:00
sitc/ml21/preprocessing/06_Rescaling_Data.ipynb

955 lines
365 KiB
Plaintext
Raw Normal View History

2024-04-03 20:50:36 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"![](images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## [Introduction to Preprocessing](00_Intro_Preprocessing.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Rescaling, Standardizing and Normalizing Data\n",
"\n",
"When our data comprises attributes with varying scales, many machine learning algorithms can benefit from rescaling the attributes to have the same scale. \n",
"\n",
"Many machine learning algorithms perform better or converge faster when features are on a relatively similar scale and/or close to normally distributed. \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Examples:\n",
"* linear and logistic regression\n",
"* nearest neighbors\n",
"* neural networks\n",
"* support vector machines with radial bias kernel functions\n",
"* principal components analysis\n",
"* linear discriminant analysis"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"The three terms are frequently used interchangeably, but their meanings differ.\n",
"\n",
2024-04-04 16:27:48 +00:00
"* **Rescaling** means adding/subtracting a constant, then multiplying/dividing by another constant so the features can lie between given minimum and maximum values. For example, convert Celsius to Fahrenheit or [0, 1].\n",
2024-04-03 20:50:36 +00:00
"\n",
"* **Standardizing**: subtract the mean and divide by the standard deviation, obtaining a \"standard normal\" distribution.\n",
"\n",
"* **Normalizing**: dividing by a vector norm, so values are between [0, 1]. This process can be helpful if you plan to use a quadratic form such as the dot-product or any other kernel to quantify the similarity of any pair of samples"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Rescaling with with Scikit-Learn\n",
"We have different classes. The main ones are:\n",
"\n",
"* **MinMaxScaler**: substracts the minimum value in the feature and divides it by the range (maximum - minimum). The result is a value in [0, 1]. Preserves the original form of the data and does not reduce the importance of outliners.\n",
"* **MaxAbsScaler**divides each feature by its maximum absolute value. The result is a value in [-1 1]. This method preserves the original form of the data and does not reduce the importance of outliners."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* **RobustScaler**: substracts the median and divides by the interquartile range (75% value - 25% value). It is used to reduce the effect of outliers. The result is not scaled in a predefined interval.\n",
"* **StandardScaler**: substracts the mean and divides by the standard deviation. The result is a standard distribution with a variance of 1 and a mean of 0."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"from sklearn import preprocessing\n",
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sns\n",
"import matplotlib\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"matplotlib.style.use('ggplot')"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"X_train = np.array([[ 1., -1., 2.],\n",
" [ 2, 0., 0.],\n",
" [ 0., 1., -1.]])"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"#scaler = preprocessing.MinMaxScaler()\n",
"#scaler = preprocessing.MaxAbsScaler()\n",
"#scaler = preprocessing.RobustScaler()\n",
"scaler = preprocessing.StandardScaler()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-2 {\n",
" /* Definition of color scheme common for light and dark mode */\n",
" --sklearn-color-text: black;\n",
" --sklearn-color-line: gray;\n",
" /* Definition of color scheme for unfitted estimators */\n",
" --sklearn-color-unfitted-level-0: #fff5e6;\n",
" --sklearn-color-unfitted-level-1: #f6e4d2;\n",
" --sklearn-color-unfitted-level-2: #ffe0b3;\n",
" --sklearn-color-unfitted-level-3: chocolate;\n",
" /* Definition of color scheme for fitted estimators */\n",
" --sklearn-color-fitted-level-0: #f0f8ff;\n",
" --sklearn-color-fitted-level-1: #d4ebff;\n",
" --sklearn-color-fitted-level-2: #b3dbfd;\n",
" --sklearn-color-fitted-level-3: cornflowerblue;\n",
"\n",
" /* Specific color for light theme */\n",
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
" --sklearn-color-icon: #696969;\n",
"\n",
" @media (prefers-color-scheme: dark) {\n",
" /* Redefinition of color scheme for dark theme */\n",
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
" --sklearn-color-icon: #878787;\n",
" }\n",
"}\n",
"\n",
"#sk-container-id-2 {\n",
" color: var(--sklearn-color-text);\n",
"}\n",
"\n",
"#sk-container-id-2 pre {\n",
" padding: 0;\n",
"}\n",
"\n",
"#sk-container-id-2 input.sk-hidden--visually {\n",
" border: 0;\n",
" clip: rect(1px 1px 1px 1px);\n",
" clip: rect(1px, 1px, 1px, 1px);\n",
" height: 1px;\n",
" margin: -1px;\n",
" overflow: hidden;\n",
" padding: 0;\n",
" position: absolute;\n",
" width: 1px;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-dashed-wrapped {\n",
" border: 1px dashed var(--sklearn-color-line);\n",
" margin: 0 0.4em 0.5em 0.4em;\n",
" box-sizing: border-box;\n",
" padding-bottom: 0.4em;\n",
" background-color: var(--sklearn-color-background);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-container {\n",
" /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
" but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
" so we also need the `!important` here to be able to override the\n",
" default hidden behavior on the sphinx rendered scikit-learn.org.\n",
" See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
" display: inline-block !important;\n",
" position: relative;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-text-repr-fallback {\n",
" display: none;\n",
"}\n",
"\n",
"div.sk-parallel-item,\n",
"div.sk-serial,\n",
"div.sk-item {\n",
" /* draw centered vertical line to link estimators */\n",
" background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
" background-size: 2px 100%;\n",
" background-repeat: no-repeat;\n",
" background-position: center center;\n",
"}\n",
"\n",
"/* Parallel-specific style estimator block */\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item::after {\n",
" content: \"\";\n",
" width: 100%;\n",
" border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
" flex-grow: 1;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel {\n",
" display: flex;\n",
" align-items: stretch;\n",
" justify-content: center;\n",
" background-color: var(--sklearn-color-background);\n",
" position: relative;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item {\n",
" display: flex;\n",
" flex-direction: column;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item:first-child::after {\n",
" align-self: flex-end;\n",
" width: 50%;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item:last-child::after {\n",
" align-self: flex-start;\n",
" width: 50%;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item:only-child::after {\n",
" width: 0;\n",
"}\n",
"\n",
"/* Serial-specific style estimator block */\n",
"\n",
"#sk-container-id-2 div.sk-serial {\n",
" display: flex;\n",
" flex-direction: column;\n",
" align-items: center;\n",
" background-color: var(--sklearn-color-background);\n",
" padding-right: 1em;\n",
" padding-left: 1em;\n",
"}\n",
"\n",
"\n",
"/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
"clickable and can be expanded/collapsed.\n",
"- Pipeline and ColumnTransformer use this feature and define the default style\n",
"- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
"*/\n",
"\n",
"/* Pipeline and ColumnTransformer style (default) */\n",
"\n",
"#sk-container-id-2 div.sk-toggleable {\n",
" /* Default theme specific background. It is overwritten whether we have a\n",
" specific estimator or a Pipeline/ColumnTransformer */\n",
" background-color: var(--sklearn-color-background);\n",
"}\n",
"\n",
"/* Toggleable label */\n",
"#sk-container-id-2 label.sk-toggleable__label {\n",
" cursor: pointer;\n",
" display: block;\n",
" width: 100%;\n",
" margin-bottom: 0;\n",
" padding: 0.5em;\n",
" box-sizing: border-box;\n",
" text-align: center;\n",
"}\n",
"\n",
"#sk-container-id-2 label.sk-toggleable__label-arrow:before {\n",
" /* Arrow on the left of the label */\n",
" content: \"▸\";\n",
" float: left;\n",
" margin-right: 0.25em;\n",
" color: var(--sklearn-color-icon);\n",
"}\n",
"\n",
"#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {\n",
" color: var(--sklearn-color-text);\n",
"}\n",
"\n",
"/* Toggleable content - dropdown */\n",
"\n",
"#sk-container-id-2 div.sk-toggleable__content {\n",
" max-height: 0;\n",
" max-width: 0;\n",
" overflow: hidden;\n",
" text-align: left;\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-toggleable__content.fitted {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-toggleable__content pre {\n",
" margin: 0.2em;\n",
" border-radius: 0.25em;\n",
" color: var(--sklearn-color-text);\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-toggleable__content.fitted pre {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
" /* Expand drop-down */\n",
" max-height: 200px;\n",
" max-width: 100%;\n",
" overflow: auto;\n",
"}\n",
"\n",
"#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
" content: \"▾\";\n",
"}\n",
"\n",
"/* Pipeline/ColumnTransformer-specific style */\n",
"\n",
"#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Estimator-specific style */\n",
"\n",
"/* Colorize estimator box */\n",
"#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-label label.sk-toggleable__label,\n",
"#sk-container-id-2 div.sk-label label {\n",
" /* The background is the default theme color */\n",
" color: var(--sklearn-color-text-on-default-background);\n",
"}\n",
"\n",
"/* On hover, darken the color of the background */\n",
"#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {\n",
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"/* Label box, darken color on hover, fitted */\n",
"#sk-container-id-2 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Estimator label */\n",
"\n",
"#sk-container-id-2 div.sk-label label {\n",
" font-family: monospace;\n",
" font-weight: bold;\n",
" display: inline-block;\n",
" line-height: 1.2em;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-label-container {\n",
" text-align: center;\n",
"}\n",
"\n",
"/* Estimator-specific */\n",
"#sk-container-id-2 div.sk-estimator {\n",
" font-family: monospace;\n",
" border: 1px dotted var(--sklearn-color-border-box);\n",
" border-radius: 0.25em;\n",
" box-sizing: border-box;\n",
" margin-bottom: 0.5em;\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-estimator.fitted {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
"/* on hover */\n",
"#sk-container-id-2 div.sk-estimator:hover {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-estimator.fitted:hover {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
"\n",
"/* Common style for \"i\" and \"?\" */\n",
"\n",
".sk-estimator-doc-link,\n",
"a:link.sk-estimator-doc-link,\n",
"a:visited.sk-estimator-doc-link {\n",
" float: right;\n",
" font-size: smaller;\n",
" line-height: 1em;\n",
" font-family: monospace;\n",
" background-color: var(--sklearn-color-background);\n",
" border-radius: 1em;\n",
" height: 1em;\n",
" width: 1em;\n",
" text-decoration: none !important;\n",
" margin-left: 1ex;\n",
" /* unfitted */\n",
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-unfitted-level-1);\n",
"}\n",
"\n",
".sk-estimator-doc-link.fitted,\n",
"a:link.sk-estimator-doc-link.fitted,\n",
"a:visited.sk-estimator-doc-link.fitted {\n",
" /* fitted */\n",
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-fitted-level-1);\n",
"}\n",
"\n",
"/* On hover */\n",
"div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
".sk-estimator-doc-link:hover,\n",
"div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
".sk-estimator-doc-link:hover {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
".sk-estimator-doc-link.fitted:hover,\n",
"div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
".sk-estimator-doc-link.fitted:hover {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"/* Span, style for the box shown on hovering the info icon */\n",
".sk-estimator-doc-link span {\n",
" display: none;\n",
" z-index: 9999;\n",
" position: relative;\n",
" font-weight: normal;\n",
" right: .2ex;\n",
" padding: .5ex;\n",
" margin: .5ex;\n",
" width: min-content;\n",
" min-width: 20ex;\n",
" max-width: 50ex;\n",
" color: var(--sklearn-color-text);\n",
" box-shadow: 2pt 2pt 4pt #999;\n",
" /* unfitted */\n",
" background: var(--sklearn-color-unfitted-level-0);\n",
" border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
"}\n",
"\n",
".sk-estimator-doc-link.fitted span {\n",
" /* fitted */\n",
" background: var(--sklearn-color-fitted-level-0);\n",
" border: var(--sklearn-color-fitted-level-3);\n",
"}\n",
"\n",
".sk-estimator-doc-link:hover span {\n",
" display: block;\n",
"}\n",
"\n",
"/* \"?\"-specific style due to the `<a>` HTML tag */\n",
"\n",
"#sk-container-id-2 a.estimator_doc_link {\n",
" float: right;\n",
" font-size: 1rem;\n",
" line-height: 1em;\n",
" font-family: monospace;\n",
" background-color: var(--sklearn-color-background);\n",
" border-radius: 1rem;\n",
" height: 1rem;\n",
" width: 1rem;\n",
" text-decoration: none;\n",
" /* unfitted */\n",
" color: var(--sklearn-color-unfitted-level-1);\n",
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
"}\n",
"\n",
"#sk-container-id-2 a.estimator_doc_link.fitted {\n",
" /* fitted */\n",
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-fitted-level-1);\n",
"}\n",
"\n",
"/* On hover */\n",
"#sk-container-id-2 a.estimator_doc_link:hover {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"#sk-container-id-2 a.estimator_doc_link.fitted:hover {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-3);\n",
"}\n",
"</style><div id=\"sk-container-id-2\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>StandardScaler()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-2\" type=\"checkbox\" checked><label for=\"sk-estimator-id-2\" class=\"sk-toggleable__label sk-toggleable__label-arrow \">&nbsp;&nbsp;StandardScaler<a class=\"sk-estimator-doc-link \" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.4/modules/generated/sklearn.preprocessing.StandardScaler.html\">?<span>Documentation for StandardScaler</span></a><span class=\"sk-estimator-doc-link \">i<span>Not fitted</span></span></label><div class=\"sk-toggleable__content \"><pre>StandardScaler()</pre></div> </div></div></div></div>"
],
"text/plain": [
"StandardScaler()"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scaler"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"X_train_scaled = scaler.fit_transform(X_train)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0. , -1.22474487, 1.33630621],\n",
" [ 1.22474487, 0. , -0.26726124],\n",
" [-1.22474487, 1.22474487, -1.06904497]])"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train_scaled"
]
},
{
"cell_type": "raw",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Apply the same transformation to X_test\n",
"scaler.transform(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"We will compare these transformations.\n"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# adapted from http://benalexkeen.com/feature-scaling-with-scikit-learn/\n",
"# Create several distributions\n",
"np.random.seed(1)\n",
"df1 = pd.DataFrame({\n",
" 'x1': np.random.normal(0, 2, 10000),\n",
" 'x2': np.random.normal(-5, 5, 10000)\n",
"})"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"df2 = pd.DataFrame({\n",
" # Distribution with lower outliers\n",
" 'x1': np.concatenate([np.random.normal(20, 1, 1000), np.random.normal(1, 1, 25)]),\n",
" # Distribution with higher outliers\n",
" 'x2': np.concatenate([np.random.normal(30, 1, 1000), np.random.normal(50, 1, 25)]),\n",
"})"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"# helper function\n",
"def draw_scaler(df, scaler, title):\n",
" scaled_df = scaler.fit_transform(df)\n",
" scaled_df = pd.DataFrame(scaled_df, columns=['x1', 'x2'])\n",
" fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(6, 5))\n",
" ax1.set_title('Before Scaling')\n",
" sns.kdeplot(df['x1'], ax=ax1)\n",
" sns.kdeplot(df['x2'], ax=ax1)\n",
" ax2.set_title('After ' + title)\n",
" sns.kdeplot(scaled_df['x1'], ax=ax2)\n",
" sns.kdeplot(scaled_df['x2'], ax=ax2)\n",
" plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAiwAAAHYCAYAAABunLN7AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAACzbklEQVR4nOzdeXxTdbo/8M/J0jRdsrTpvkCBtoKyqSCigoJYUVRAHUXvjHW5MwqjonfUy4z+UGfE4TJX4I6o4zIi4+CCyA5SWVzYRFQoyk4p0L1pk65p1u/vj5OTJjSlTZrkJM3zfr140SYn5zxZzumT7/J8OcYYAyGEEEJIGJOIHQAhhBBCSE8oYSGEEEJI2KOEhRBCCCFhjxIWQgghhIQ9SlgIIYQQEvYoYSGEEEJI2KOEhRBCCCFhjxIWQgghhIQ9SlgIIYQQEvYoYSF++fTTT3HFFVcgMTERHMdh7ty5YocUcQYOHIiBAwd63LZ8+XJwHIfly5eLEhMJHqvVipdffhkFBQVQKBTgOA5r164VO6x+78UXXwTHcfjqq69EiyHSz+twiZ8Sln6E47gu/xQKBQYOHIgHHngAR48eDchx9u3bh3vvvRdNTU147LHHMH/+fNx8880B2XewrVq1CjfffDNSU1Mhl8uRnJyMYcOG4T/+4z/wwQcfiB0eiUCvvPKK63w7fvx4t9stXrwY8+fPR0ZGBv7whz9g/vz5uOSSS8LiD6q7qqoqPPXUUxg2bBji4uKgVCqRm5uLiRMn4k9/+hNOnz7tsf31118PjuNEirb/8vV9iAYysQMggTd//nzXz01NTdi/fz9WrFiB1atXY9euXRg1alSf9r9x40YwxrBixQqMHz++j9GGzm9/+1u88847UCqVuPXWW5GXl4e2tjacPn0aa9aswVdffYUHHnhA1BhnzJiBcePGISMjQ9Q4SO8wxvDee++B4zgwxvDuu+9i0aJFXrddv349EhIS8OWXXyImJibEkfbO4cOHcf3116OxsRHDhw/HAw88ALVajXPnzuHQoUNYsGAB8vLyMHjwYLFD7dfoffCOEpZ+6MUXX+xy2+OPP47XX38dS5Ys6XOzXlVVFQAgMzOzT/sJpV27duGdd95BdnY29u7di+zsbI/729rawuIbrlqthlqtFjsM0kslJSU4c+YM/vM//xPr1q3DBx98gFdeecVrQlJVVYXk5OSwTVYA4KmnnkJjYyNefPFFjy8+gsOHD0Mmoz8bwUbvg3fUJRQlbrrpJgBAfX291/s/+ugj3HDDDdBqtYiNjcXQoUPxl7/8BWaz2bWN0I/5/vvvAwDy8vJcTeHl5eWu7Q4cOICZM2ciNTUVCoUCAwYMwGOPPeZKdNwVFxeD4ziUlZVhyZIlGD58OJRKJa6//nrXNo2NjZg3bx6GDh0KpVIJtVqNyZMno6SkpNfPf/fu3QCAO++8s0uyAgDx8fG49dZbvT62pKQEt912m+v55OTk4I477sC2bdtc21gsFrz++uu45ZZbMGDAACgUCmi1WkyePBmbNm3qdZzd9RUL413a29vxzDPPIDc3FwqFAkOGDMFf//pXeFt0nTGGpUuXYtiwYYiNjUVWVhZ+//vfo6mpyev4GeK7d955BwDwyCOP4P7770d9fX2XcSnCZ/zMmTM4e/as65wR3oOXXnoJAHDDDTd4dOe6a29vx6uvvopRo0YhPj4eCQkJuPrqq/HRRx91iemrr74Cx3F48cUXsW/fPkydOhVarbbLeeqNcJ48+eSTXu8fPnw4hg4dCgAoLy8Hx3H4+uuvAXh2Sbufvzt37sRvf/tbDBs2DCqVCkqlEpdeeinmz58Pk8nU5RjuXWSfffYZxo4di7i4OCQlJeGee+5BRUWF19h++OEH3HzzzUhMTIRKpcKNN96IPXv2dPtc165di//4j/9AQUGB6zW9/PLLsWTJEtjt9i7b9+ZaderUKdx9993QarWIj4/H+PHjsXHjxm5j6I4v74O7xsZG/OlPf8Jll12GuLg4qNVqjBw5Ev/93/+NtrY213Y//PADnnzySYwcORJJSUmIjY1Ffn4+nn76aTQ2NvoUa0VFBX7/+99j0KBBUCgUSE5Oxu23347vv/++y7bu7+2KFSswZswYxMfH9/paFH0pWpQS/riOHTu2y30PP/ww/vnPfyInJwd33nkn1Go19u3bhxdeeAHbt29HSUkJ5HI5Ro0ahfnz52Pt2rU4dOgQnnzySWg0GgBw/b9u3Trcfffd4DgOd911F3Jzc3HgwAG89dZbWLduHXbt2oVBgwZ1ieGJJ57Arl27cOutt+KWW26BVCoFAJw9exbXX389ysvLMWHCBEydOhWtra3YuHEjbr75Zrz11lv47W9/2+PzT0lJAQCcOHHCp9dt/vz5ePnll5GQkIDp06cjJycHVVVV2L17Nz788EPceOONAPgLxZNPPonx48djypQpSElJQXV1NdatW4dp06bhH//4R6/ivBir1YqbbroJVVVVmDp1KmQyGdauXYt58+bBZDK5/vAJ5syZgzfffBOZmZn47W9/i5iYGKxfvx779++H1WqFXC7vUzzRrra2FuvXr8fQoUMxduxYKJVKLF68GG+//TZ+9atfubabPn06Bg4ciCVLlgCAa4C6cM6sXbsWX3/9NR544AGvF26j0YhJkybhp59+whVXXIGHHnoIDocDW7duxX333YdffvkFf/nLX7o8bs+ePViwYAGuu+46PPzww6irq+uxdSclJQXnz5/HiRMnvF4r3Gk0GsyfPx/Lly/H2bNnPVoC3J/HwoULcezYMYwfPx633norTCYTdu/ejZdffhk7d+7Ejh07vLYWvPHGG1i/fj1uv/12TJw4Ed999x0+/fRTHDx4EKWlpVAoFB7P9cYbb4TFYsHMmTMxZMgQHDx4EDfccAMmTZrkNf7//u//hkQiwVVXXYWsrCwYjUZs374dTz31FPbv34+VK1d6fVx316qTJ0/i6quvRkNDA6ZOnYpRo0bh1KlTmD59Om655ZaLvpYX8uV9EJw5cwY33HADzp49iyuuuAKPPfYYHA4Hjh8/jsWLF+PRRx9FfHw8AD7RXrNmDSZOnIgbb7wRdrsdBw4cwOLFi7F582Z8//33SExM7PGYP/74I2666SY0NjaiqKgIM2fOhF6vx9q1a3HttddizZo1Xp/73/72N2zbtg233XYbJk2aBKPR2LsXhpF+AwADwObPn+/699RTT7Frr72WcRzH7rjjDtbS0uLxmPfff58BYHfddRczmUwe982fP58BYIsXL/a4/YEHHmAA2JkzZzxub2lpYUlJSUwqlbLdu3d73LdgwQIGgN14441e95WZmcnKysq6PKeJEycyjuPYp59+6nG7wWBgI0eOZLGxsay6urrH16ayspJpNBoGgE2bNo2tWLGCHT16lNnt9m4fs3XrVgaADRo0iFVUVHjc53A42Pnz512/d3R0ePwuaGxsZEOHDmVarZa1t7d73DdgwAA2YMAAj9uE9+P999/vsi0ANnXqVI/91NbWMrVazVQqFbNYLK7bv/nmGwaAFRQUMIPB4LrdbDaz6667jgHocmzim1dffZUBYH/9619dt40ePZpxHMdOnz7dZXtv7zdjnefZzp07vR5HOEf+9re/edxuMplYUVER4ziO/fjjj67bd+7c6boWvPXWWz49p2effZYBYKmpqWz+/Plsx44dzGg0XvQxEydOZBf7U3L69GnmcDi63D5v3jwGgH300UcetwuvR2JiIistLfW4b9asWQwA+/jjj123ORwOVlhYyACwtWvXemy/ZMkS12tx4et76tSpLjHZ7XZ2//33MwBs7969Hvf1dK2aMmUKA8CWLFnicfvatWtdMVx4XnfHn/dh/PjxDABbsGBBl/vq6+s9ru/l5eXMZrN12e6tt95iANirr77qcbu365LVamWDBw9msbGx7Ntvv/XYvrKykmVmZrK0tDSP4wrvbVxcnMdntrcoYelHhJPC279hw4axf/3rX10eM2rUKCaXyz3+qAlsNhtLTk5mV155pcft3SUs//rXvxgAdv/993fZl8Vicf3RLS8v77KvC5Mixhg
"text/plain": [
"<Figure size 600x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"draw_scaler(df1, preprocessing.StandardScaler(), \"Standard Scaler\")"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAiQAAAHYCAYAAAB9S/OPAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAACZJklEQVR4nO2dd3wUdf7/X7MlnRRIQgIESIDEIIaiIiDSIVKUYpc7jeXuFCzgWQ49v2DD4/BnUFGxoKAeFkAQECVSVKoICEGlQ9BAAglkk5C2ZT6/P2ZntidbZrbl/Xw88kgy85nZz052PnnNu3KMMQaCIAiCIIgAogr0BAiCIAiCIEiQEARBEAQRcEiQEARBEAQRcEiQEARBEAQRcEiQEARBEAQRcEiQEARBEAQRcEiQEARBEAQRcEiQEARBEAQRcEiQEARBEAQRcEiQEE754osvcOWVV6JNmzbgOA4zZswI9JRCjq5du6Jr164225YsWQKO47BkyZKAzIlQBoPBgOeffx7Z2dmIjIwEx3FYvXp1oKcV9syZMwccx+H7778P2BxC/Z4OpvmTIAkhOI5z+IqMjETXrl1x991349ChQ7K8zq5du3D77bejuroaDz74IGbPno3rr79elnMrzfLly3H99dcjNTUVWq0W7dq1Q8+ePfGXv/wFS5cuDfT0iBDjpZdeku61I0eOuBxXWFiI2bNnIz09HY8//jhmz56Nyy67LCj+YVpz9uxZzJw5Ez179kRMTAyio6PRuXNnDB06FM888wxOnDhhM37YsGHgOC5Asw1fPP07tBY0gZ4A4TmzZ8+Wfq6ursbu3bvx0UcfYeXKldi2bRv69Onj0/nXrVsHxhg++ugjDBo0yMfZ+o+///3veO+99xAdHY3x48cjMzMTdXV1OHHiBFatWoXvv/8ed999d0DnOHnyZAwYMADp6ekBnQfRMowxLF68GBzHgTGG999/H/Pnz3c6ds2aNYiLi8N3332HiIgIP8/UPQ4ePIhhw4bh4sWLuOKKK3D33XcjISEBf/zxBw4cOIC5c+ciMzMT3bp1C/RUwxr6O7iGBEkIMmfOHIdtDz/8MBYuXIgFCxb4bHo7e/YsAKBDhw4+ncefbNu2De+99x46deqEnTt3olOnTjb76+rqguIpNSEhAQkJCYGeBuEGRUVFOHXqFP72t7/hq6++wtKlS/HSSy85FRxnz55Fu3btglaMAMDMmTNx8eJFzJkzx+ahRuTgwYPQaOhfgtLQ38E15LIJE8aMGQMAqKiocLr/008/xfDhw5GUlISoqCjk5ubixRdfRFNTkzRG9CV++OGHAIDMzEzJXF1SUiKN27NnD6ZMmYLU1FRERkaiS5cuePDBByUhY01BQQE4jsPJkyexYMECXHHFFYiOjsawYcOkMRcvXsSsWbOQm5uL6OhoJCQkYOTIkSgqKnL7/W/fvh0AcNNNNzmIEQCIjY3F+PHjnR5bVFSEG264QXo/GRkZmDhxIjZu3CiN0ev1WLhwIcaNG4cuXbogMjISSUlJGDlyJL7++mu35+nKXyvGm9TX1+OJJ55A586dERkZie7du+M///kPnDXlZozhtddeQ8+ePREVFYWOHTvioYceQnV1tdP4FcIz3nvvPQDA/fffj6lTp6KiosIhLkT8fJ86dQqnT5+W7hfx+j/33HMAgOHDh9u4Wq2pr6/Hyy+/jD59+iA2NhZxcXEYOHAgPv30U4c5ff/99+A4DnPmzMGuXbswduxYJCUlOdyjzhDvkUcffdTp/iuuuAK5ubkAgJKSEnAchx9++AGArbvY+t7dsmUL/v73v6Nnz56Ij49HdHQ0Lr/8csyePRsNDQ0Or2HtwlqxYgX69++PmJgYtG3bFrfddhtKS0udzm3v3r24/vrr0aZNG8THx2PUqFHYsWOHy/e6evVq/OUvf0F2drZ0Tfv164cFCxbAZDI5jHdnnTp+/DhuueUWJCUlITY2FoMGDcK6detczsEVnvwdrLl48SKeeeYZ9OrVCzExMUhISEDv3r3xr3/9C3V1ddK4vXv34tFHH0Xv3r3Rtm1bREVFoUePHnjsscdw8eJFj+ZaWlqKhx56CFlZWYiMjES7du1w44034ueff3YYa/23/eijj3D11VcjNjbWo3WodcqwMET859m/f3+Hfffddx8++OADZGRk4KabbkJCQgJ27dqFZ599Fps2bUJRURG0Wi369OmD2bNnY/Xq1Thw4AAeffRRJCYmAoD0/auvvsItt9wCjuNw8803o3PnztizZw8WLVqEr776Ctu2bUNWVpbDHB555BFs27YN48ePx7hx46BWqwEAp0+fxrBhw1BSUoIhQ4Zg7NixuHTpEtatW4frr78eixYtwt///vcW339KSgoA4OjRox5dt9mzZ+P5559HXFwcJk2ahIyMDJw9exbbt2/HJ598glGjRgEQFoNHH30UgwYNwujRo5GSkoKysjJ89dVXmDBhAt555x235tkcBoMBY8aMwdmzZzF27FhoNBqsXr0as2bNQkNDg/TPTWT69Ol4++230aFDB/z9739HREQE1qxZg927d8NgMECr1fo0n9bMuXPnsGbNGuTm5qJ///6Ijo5GYWEh3n33Xdx6663SuEmTJqFr165YsGABAEjB3+L9snr1avzwww+4++67nS7MOp0OI0aMwC+//IIrr7wS9957L3iex4YNG3DnnXfit99+w4svvuhw3I4dOzB37lxcd911uO+++3D+/PkWrTMpKSn4888/cfToUafrhDWJiYmYPXs2lixZgtOnT9s8yVu/j3nz5uHw4cMYNGgQxo8fj4aGBmzfvh3PP/88tmzZgs2bNzt92n/rrbewZs0a3HjjjRg6dCh++uknfPHFF9i/fz+Ki4sRGRlp815HjRoFvV6PKVOmoHv37ti/fz+GDx+OESNGOJ3/v/71L6hUKlxzzTXo2LEjdDodNm3ahJkzZ2L37t1YtmyZ0+NcrVPHjh3DwIEDceHCBYwdOxZ9+vTB8ePHMWnSJIwbN67Za2mPJ38HkVOnTmH48OE4ffo0rrzySjz44IPgeR5HjhxBYWEhHnjgAcTGxgIQhPSqVaswdOhQjBo1CiaTCXv27EFhYSHWr1+Pn3/+GW3atGnxNfft24cxY8bg4sWLyM/Px5QpU1BZWYnVq1dj8ODBWLVqldP3/sorr2Djxo244YYbMGLECOh0OvcvDiNCBgAMAJs9e7b0NXPmTDZ48GDGcRybOHEiq62ttTnmww8/ZADYzTffzBoaGmz2zZ49mwFghYWFNtvvvvtuBoCdOnXKZnttbS1r27YtU6vVbPv27Tb75s6dywCwUaNGOT1Xhw4d2MmTJx3e09ChQxnHceyLL76w2V5VVcV69+7NoqKiWFlZWYvX5syZMywxMZEBYBMmTGAfffQRO3ToEDOZTC6P2bBhAwPAsrKyWGlpqc0+nufZn3/+Kf3e2Nho87vIxYsXWW5uLktKSmL19fU2+7p06cK6dOlis038e3z44YcOYwGwsWPH2pzn3LlzLCEhgcXHxzO9Xi9t//HHHxkAlp2dzaqqqqTtTU1N7LrrrmMAHF6bcJ+XX36ZAWD/+c9/pG19+/ZlHMexEydOOIx39rdmzHKPbdmyxenriPfHK6+8YrO9oaGB5efnM47j2L59+6TtW7ZskdaBRYsWefSennzySQaApaamstmzZ7PNmzcznU7X7DFDhw5lzf2bOHHiBON53mH7rFmzGAD26aef2mwXr0ebNm1YcXGxzb477riDAWCfffaZtI3neZaTk8MAsNWrV9uMX7BggXQt7K/v8ePHHeZkMpnY1KlTGQC2c+dOm30trVOjR49mANiCBQtstq9evVqag/097Qpv/g6DBg1iANjcuXMd9lVUVNis7SUlJcxoNDqMW7RoEQPAXn75ZZvtztYkg8HAunXrxqKiotjWrVttxp85c4Z16NCBtW/f3uZ1xb9tTEyMzWfWE0iQhBDiB9/ZV8+ePdnHH3/scEyfPn2YVqu1+aclYjQaWbt27dhVV11ls92VIPn4448ZADZ16lSHc+n1eumfaklJicO57EUPY4zt37+fAWC33HK
"text/plain": [
"<Figure size 600x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"draw_scaler(df2, preprocessing.StandardScaler(), \"Standard Scaler\")"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAisAAAHYCAYAAACMQKgCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAAC84UlEQVR4nOyde3xT9f3/Xye3NmmbNL3faQttBRQBFRGnIEzQiSLq5m2bOPnpFJ3Or7rhZehU/DI30O/QORUF3FARREFUKqIoIIKoFAEBKYVeadMmvTfXz++Pk3OStGmbpElO0ryfjwcPmnM+55x30ubklfeVY4wxEARBEARBRCgyqQ0gCIIgCIIYCBIrBEEQBEFENCRWCIIgCIKIaEisEARBEAQR0ZBYIQiCIAgioiGxQhAEQRBERENihSAIgiCIiIbECkEQBEEQEQ2JFYIgCIIgIhoSK0RArF27Fueccw6SkpLAcRzuu+8+qU2KOgoLC1FYWOixbeXKleA4DitXrpTEJkJ6rFYr/vrXv6K0tBRxcXHgOA7vvfee1GYFnc8//xwcx+Hxxx+X2pSYJNruNSRWhhEcx/X5FxcXh8LCQtxyyy04fPhwUK6ze/du3HDDDWhtbcWdd96JRYsW4bLLLgvKuUPNO++8g8suuwwZGRlQKpVITU3FmDFj8Otf/xqrVq2S2jxiGPP000+L78sjR470u27ZsmVYtGgRsrOz8cADD2DRokU444wz8Pjjj4PjOHz++efhM7ofhA86juMwY8aMftd9/fXX4rq8vLyQ2VNYWChe58svv+x33eWXXy6ue/XVV0NmT3/U1dXhj3/8I8aMGQONRgO1Wo2CggJMnToVjzzyCI4fPx52m6IFhdQGEMFn0aJF4s+tra3Ys2cPVq9ejfXr12PHjh0YP378kM7/wQcfgDGG1atXY8qUKUO0NnzcfvvteOWVV6BWq3HFFVegqKgInZ2dOH78ODZs2IDPP/8ct9xyi6Q2zp07F5MnT0Z2drakdhDBhTGGFStWgOM4MMbw6quv4tlnn/W6duPGjUhMTMQnn3wClUoVZkv9Q6FQ4LPPPkNlZSWKi4v77H/11VehUChgs9n67Js0aRIOHz6MtLS0oNqzYsUKXHTRRX32VVdXo7y8vF97Qs2BAwcwbdo0tLS04KyzzsItt9wCnU6HU6dOYf/+/Vi8eDGKioowcuTIsNsWFTBi2ACA9fcrvfvuuxkAdssttwz5OrfeeisDwE6cODHkc4WLL7/8kgFgeXl5rLq6us/+jo4O9sEHH4TVphEjRrARI0aE9ZqENHz88ccMAPt//+//sYyMDJaens7MZrPXtUVFRV7/LhYtWsQAsM8++yy0xvrA66+/zgCwOXPmMADskUce6bOmo6ODJSUliWtyc3NDZs+IESMYAHbVVVcxjUbDWltb+6x5/PHHPWx+5ZVXQmaPN2bMmMEAsMcff9zr/oqKCnbo0KGw2SP8Dl9//fWwXXMoUBgoRpg5cyYAoKmpyev+N998E5dccgn0ej3i4+MxevRoPPXUUzCbzeIawfX7+uuvAwCKiopEl2pVVZW47ptvvsE111yDjIwMxMXFYcSIEbjzzjtRV1fX57rz5s0Dx3GorKzEc889h7POOgtqtRrTpk0T17S0tGDhwoUYPXo01Go1dDodZsyYgfLycp+f/86dOwEA1157rVd3dEJCAq644gqvx5aXl+PKK68Un09+fj7mzJmDrVu3imssFguWL1+OX/ziFxgxYgTi4uKg1+sxY8YMbN682Wc7+4sjC/ktXV1dePDBB1FQUIC4uDiMGjUK//u//wvmZXg6YwzPP/88xowZg/j4eOTm5uLuu+9Ga2ur13wZInS88sorAID58+fj5ptvRlNTU588FOG9cOLECZw8eVJ8bwm/qyeeeAIAcMkll3iEet3p6urCM888g/HjxyMhIQGJiYm44IIL8Oabb/axyT1nZPfu3bj88suh1+v7vJ8H4swzz8SkSZOwcuVK2O12j31vvfUW2tvbMX/+fK/H9pezMm3aNHAcB5vNhsWLF6OkpER83z344IMe96TezJ8/H11dXX2er8PhwOuvv45zzjmnX8/yvn37cO+99+Lss89GSkoK4uPjUVJSgvvvvx8tLS0ea41GIwoLCxEXF4d9+/b1uZbwHNztEO5B9957r9frn3XWWRg9enSf7S0tLXjkkUdw5plnQqPRQKfT4eyzz8af//xndHZ2BmT/YNTU1ODuu+9GcXEx4uLikJqaiquuugp79+7ts9Y9PLl69Wqcd955SEhICPr9hcJAMYLwwTpp0qQ++2677Ta89tpryM/Px7XXXgudTofdu3fjsccew6effory8nIolUqMHz8eixYtwnvvvYf9+/fj3nvvRXJyMgCI/7///vv45S9/CY7jcN1116GgoADffPMNXnrpJbz//vvYsWOHV3fxH/7wB+zYsQNXXHEFfvGLX0AulwMATp48iWnTpqGqqgoXX3wxLr/8cnR0dOCDDz7AZZddhpdeegm33377oM8/PT0dAHD06FG/XrdFixbhr3/9KxITE3H11VcjPz8fdXV12LlzJ/7zn//g5z//OQD+hnLvvfdiypQpuPTSS5Geno76+nq8//77mD17Nv7973/7ZOdAWK1WzJw5E3V1dbj88suhUCjw3nvvYeHCheju7hY/zAQWLFiAf/3rX8jJycHtt98OlUqFjRs3Ys+ePbBarVAqlUOyh/CN06dPY+PGjRg9ejQmTZoEtVqNZcuW4eWXX8avfvUrcd3VV1+NwsJCPPfccwAgJq0L76333nsP27dvxy233OL1g8BkMmH69On47rvvcM455+B3v/sdHA4HtmzZgptuugkHDx7EU0891ee4Xbt2YfHixbjoootw2223obGx0a/w0/z583H77bfj448/9hD8r7zyCnJzc3H55Zf7fC53brrpJnz55Ze4/PLLodVq8eGHH+Lvf/87Ghsb+80vmzVrFvLz8/Hqq6/ijjvuELeXl5fj5MmT+POf/4yGhgavx77yyivYsGEDpk6dip///Oew2+345ptvsGzZMnz44YfYu3cvkpKSAAB6vR5vvfUWLrroIlx//fX49ttvodVqAQBPPPEEtm/fjttuuw033nijeP709HRUV1fj6NGjXu/D3jhx4gQuueQSnDx5Eueccw7uvPNOOBwOHDlyBMuWLcPvf/97JCQk+G3/QHz77beYOXMmWlpaMGvWLFxzzTUwGAx477338LOf/QwbNmzAL37xiz7H/f3vf8fWrVtx5ZVXYvr06TCZTD49R5+R2rVDBA84w0CLFi0S//3xj39kP/vZzxjHcWzOnDmsvb3d4xjBFXjdddex7u5uj32C23nZsmUe22+55RavYaD29naWkpLC5HI527lzp8e+xYsXMwDs5z//uddz5eTksMrKyj7PaerUqYzjOLZ27VqP7UajkZ199tksPj6e1dfXD/ra1NbWsuTkZAaAzZ49m61evZodPnyY2e32fo/ZsmULA8CKi4tZTU2Nxz6Hw+ERTurp6fEaXmppaWGjR49mer2edXV1eezzFgbqzzUruLkvv/xyj/OcPn2a6XQ6ptVqmcViEbd/8cUXDAArLS1lRqNR3G42m9lFF13EAFAIKkw888wzDAD73//9X3HbhAkTGMdx7Pjx433W9xceHCwMJLyX/v73v3ts7+7uZrNmzWIcx7Fvv/1W3P7ZZ5+J94yXXnrJr+ck/J0+8sgjrL29nSUmJrK5c+eK+3/44QeP8BC8hIGE6y9atMhj+9SpUxkANnHiRNbc3Cxu7+joYCNHjmQymYzV1dV5HCO8P6xWK/vLX/7CALD9+/eL+6+99loxPCS8jr3DQFVVVcxms/V5ri+99BIDwJ555pk++5YsWcIAsBtuuIExxti2bduYTCZjY8aMYZ2dnR5rH3roIQaAZWRksEWLFrFt27Yxk8nU55zuTJkyhQFgixcv7rOvqanJ457tr/3e7jVWq5WNHDmSxcfHsy+//NJjfW1tLcvJyWGZmZke1xVeT41G4/H3FWxIrAwjhBuPt39jxoxhb7zxRp9jxo8fz5RKpcc
"text/plain": [
"<Figure size 600x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"draw_scaler(df1, preprocessing.MinMaxScaler(), \"MinMaxScaler\")"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAHYCAYAAABwVYPIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAACfeklEQVR4nO2deXwTZf7HP5Oz6V1oSwstUJBWUDk8EFkUBAVBVPBYFXfXqqwHuIu6Houu642L+hNd0fUAxWPxQkFAlAqoCyIiqICCgJQChZa29L5yPr8/JjNJ2iTNMVfS7/v16qvt5JmZJ5PJk0++J8cYYyAIgiAIglABndoTIAiCIAii+0JChCAIgiAI1SAhQhAEQRCEapAQIQiCIAhCNUiIEARBEAShGiRECIIgCIJQDRIiBEEQBEGoBgkRgiAIgiBUg4QIQRAEQRCqQUKE8MsHH3yAM844AykpKeA4DnfccYfaU4o5+vfvj/79+/tsW7JkCTiOw5IlS1SZE6E+drsdjz76KAoLC2E2m8FxHFasWKH2tCTnq6++AsdxePjhh9WeSrckltYaEiIxBMdxnX7MZjP69++P66+/Hnv27JHkPFu2bME111yDhoYG3HbbbXjooYdw0UUXSXJsufnwww9x0UUXITs7G0ajET179sSQIUPwhz/8AW+++aba0yPimCeeeEJ8X+7duzfguAULFuChhx5Cbm4u7r77bjz00EM4+eST8fDDD4PjOHz11VfKTToAwocYx3GYMGFCwHHfffedOC4vL0+2+fTv3188z8aNGwOOmzx5sjhu0aJFss0nEMeOHcOdd96JIUOGIDExERaLBX379sXYsWPxwAMP4MCBA4rPKRYwqD0BInweeugh8e+GhgZs3boVb731Fj766CNs2rQJw4cPj+r4q1evBmMMb731FkaPHh3lbJXj5ptvxmuvvQaLxYKLL74YBQUFaGlpwYEDB7B8+XJ89dVXuP7661Wd4/Tp0zFq1Cjk5uaqOg9CWhhjWLx4MTiOA2MMixYtwtNPP+137MqVK5GcnIwvvvgCJpNJ4ZmGh8FgwJdffonS0lIMGDCg0+OLFi2CwWCAw+Ho9NjIkSOxZ88eZGZmSjqfxYsX49xzz+302JEjR1BSUhJwPnKza9cujBs3DrW1tTjttNNw/fXXIy0tDYcPH8aOHTswb948FBQUYODAgYrPTfMwImYAwAK9ZLfffjsDwK6//vqoz3PDDTcwAOzgwYNRH0spNm7cyACwvLw8duTIkU6PNzc3s9WrVys6p379+rF+/fopek5CHT7//HMGgP35z39m2dnZLCsri1mtVr9jCwoK/N4XDz30EAPAvvzyS3knGwJvvPEGA8Auu+wyBoA98MADncY0NzezlJQUcUyfPn1km0+/fv0YAHbppZeyxMRE1tDQ0GnMww8/7DPn1157Tbb5+GPChAkMAHv44Yf9Pr5z5062e/duxeYjvIZvvPGGYueMFHLNxAkTJ04EAFRXV/t9/N1338X555+PjIwMJCQkYPDgwXj88cdhtVrFMYI59o033gAAFBQUiGbOsrIycdy2bdtw+eWXIzs7G2azGf369cNtt92GY8eOdTpvcXExOI5DaWkpnnvuOZx22mmwWCwYN26cOKa2thZz587F4MGDYbFYkJaWhgkTJqCkpCTk5//NN98AAK644gq/JuKkpCRcfPHFfvctKSnBJZdcIj6f/Px8XHbZZVi3bp04xmazYeHChZgyZQr69esHs9mMjIwMTJgwAZ9++mnI8wzktxXiSVpbW3HPPfegb9++MJvNOOmkk/Cvf/0LzE+TbMYYnn/+eQwZMgQJCQno06cPbr/9djQ0NPiNTyHk47XXXgMAzJw5E9dddx2qq6s7xX0I74WDBw/i0KFD4ntLeK0eeeQRAMD555/v4371prW1FU8++SSGDx+OpKQkJCcn45xzzsG7777baU7eMRpbtmzB5MmTkZGR0en9HIxTTz0VI0eOxJIlS+B0On0ee++999DU1ISZM2f63TdQjMi4cePAcRwcDgfmzZuHQYMGie+7e+65x2dN6sjMmTPR2tra6fm6XC688cYbOOOMMwJahLdv3445c+Zg2LBh6NGjBxISEjBo0CDcddddqK2t9RlbV1eH/v37w2w2Y/v27Z3OJTwH73kIa9CcOXP8nv+0007D4MGDO22vra3FAw88gFNPPRWJiYlIS0vDsGHD8Pe//x0tLS0Rzb8rysvLcfvtt2PAgAEwm83o2bMnLr30Unz//fedxnq7DN966y2cddZZSEpKknR9IddMnCB8aI4cObLTYzfddBNef/115Ofn44orrkBaWhq2bNmCBx98EOvXr0dJSQmMRiOGDx+Ohx56CCtWrMCOHTswZ84cpKenA4D4+5NPPsFVV10FjuNw5ZVXom/fvti2bRtefvllfPLJJ9i0aZNfE+5f//pXbNq0CRdffDGmTJkCvV4PADh06BDGjRuHsrIynHfeeZg8eTKam5uxevVqXHTRRXj55Zdx8803d/n8s7KyAAD79u0L67o99NBDePTRR5GcnIxp06YhPz8fx44dwzfffIN33nkHF1xwAQB+sZgzZw5Gjx6NCy+8EFlZWaioqMAnn3yCqVOn4pVXXglpnsGw2+2YOHEijh07hsmTJ8NgMGDFihWYO3cu2traxA8qgdmzZ+M///kPevfujZtvvhkmkwkrV67E1q1bYbfbYTQao5oPERrHjx/HypUrMXjwYIwcORIWiwULFizAq6++it///vfiuGnTpqF///547rnnAEAMABfeWytWrMDXX3+N66+/3u8iX19fj/Hjx+PHH3/EGWecgRtvvBEulwtr167FjBkz8Msvv+Dxxx/vtN/mzZsxb948nHvuubjppptQVVUVlkto5syZuPnmm/H555/7iPnXXnsNffr0weTJk0M+ljczZszAxo0bMXnyZKSmpmLNmjV45plnUFVVFTCea9KkScjPz8eiRYtwyy23iNtLSkpw6NAh/P3vf0dlZaXffV977TUsX74cY8eOxQUXXACn04lt27ZhwYIFWLNmDb7//nukpKQAADIyMvDee+/h3HPPxdVXX40ffvgBqampAIBHHnkEX3/9NW666SZce+214vGzsrJw5MgR7Nu3z+867I+DBw/i/PPPx6FDh3DGGWfgtttug8vlwt69e7FgwQLceuutSEpKCnv+wfjhhx8wceJE1NbWYtKkSbj88stRU1ODFStWYMyYMVi+fDmmTJnSab9nnnkG69atwyWXXILx48ejvr4+pOcYEmqbZIjQgds189BDD4k/d955JxszZgzjOI5ddtllrKmpyWcfwTx35ZVXsra2Np/HBFPwggULfLZff/31fl0zTU1NrEePHkyv17NvvvnG57F58+YxAOyCCy7we6zevXuz0tLSTs9p7NixjOM49sEHH/hsr6urY8OGDWMJCQmsoqKiy2tz9OhRlp6ezgCwqVOnsrfeeovt2bOHOZ3OgPusXbuWAWADBgxg5eXlPo+5XC4fF097e7tfl09tbS0bPHgwy8jIYK2trT6P+XPNBDKXCqbnyZMn+xzn+PHjLC0tjaWmpjKbzSZu/9///scAsMLCQlZXVydut1qt7Nxzz2UAyC2kEE8++SQDwP71r3+J20aMGME4jmMHDhzoND6Qy64r14zwXnrmmWd8tre1tbFJkyYxjuPYDz/8IG7/8ssvxTXj5ZdfDus5CffpAw88wJqamlhycjKbPn26+PjPP//s47KBH9eMcP6HHnrIZ/vYsWMZAHb66aezEydOiNubm5vZwIEDmU6nY8eOHfPZR3h/2O129s9//pMBYDt27BAfv+KKK0SXjXAdO7pmysrKmMPh6PRcX375ZQaAPfnkk50emz9/PgPArrnmGsYYYxs2bGA6nY4NGTKEtbS0+Iy99957GQCWnZ3NHnroIbZhwwZWX1/f6ZjejB49mgFg8+bN6/RYdXW1z5od7vz9rTV2u50NHDiQJSQksI0bN/qMP3r0KOvduzfr1auXz3mF65mYmOhzf0kJCZEYQlhU/P0MGTKEvf322532GT58ODMajT4fVgIOh4P
"text/plain": [
"<Figure size 600x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"draw_scaler(df2, preprocessing.MinMaxScaler(), \"MinMaxScaler\")"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAisAAAHYCAYAAACMQKgCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAACscklEQVR4nOzdeXxTVf4//tdNs3XL1qQbbaGFtoKCIAqIC5tYURRwmXH5zogzfsYFHZcZdXB00JkR5eeM6IiOjhsugyOCyOJCBRQFVBCVouyUAt1o0yZd0zTL+f1xc9OGpm2SJrk36fv5ePCgvbm59yRNbt45533eh2OMMRBCCCGESJRM7AYQQgghhPSFghVCCCGESBoFK4QQQgiRNApWCCGEECJpFKwQQgghRNIoWCGEEEKIpFGwQgghhBBJo2CFEEIIIZJGwQohhBBCJI2CFRKSlStXYvz48UhNTQXHcbj33nvFblLMGTZsGIYNG+azbfny5eA4DsuXLxelTUR6HA4H/vrXv6KoqAgqlQocx+HDDz8Uu1lR98UXX4DjODz22GNiNyUmxfq1hYKVOMJxXI9/KpUKw4YNw80334z9+/eH5TzffPMNrr/+ejQ1NeGOO+7AokWLcNlll4Xl2JH2/vvv47LLLkN6ejoUCgXS0tIwatQo/L//9//w5ptvit08Mog88cQT3vfpwYMHe91v6dKlWLRoEbKysvDHP/4RixYtwhlnnIHHHnsMHMfhiy++iF6jeyF8EPq79vzqV7/CTz/9JHYTwy6QD//q6mrcd999GDVqFJKSkpCYmIi8vDxMmTIFf/7zn3H06NHoNTjGycVuAAm/RYsWeX9uamrCzp078dZbb2H16tXYtm0bxo4dO6Djb9iwAYwxvPXWW5g8efIAWxs9v/vd7/DKK68gMTERV1xxBfLz89HW1oajR49izZo1+OKLL3DzzTeL2sZ58+Zh0qRJyMrKErUdJLIYY3jttdfAcRwYY3j11Vfx9NNP+9133bp1SElJwWeffQalUhnllgbn7LPPxty5cwEAzc3N2L59O9555x2sWrUKW7Zswfnnny9uA6No7969mDp1KhobGzF69GjcfPPN0Gq1OHHiBPbs2YPFixcjPz8fw4cPF7upMYGClTjkr5v07rvvxrJly/Dss88OuBuwuroaAJCdnT2g40TTtm3b8MorryAnJwdff/01cnJyfG5va2uTxDdUrVYLrVYrdjNIhJWWluLYsWP4v//7P6xduxZvvvkmnnjiCb/BSHV1NdLS0iQfqADA2LFje1x/br/9drz88sv485//jC1btojTMBHcd999aGxsxGOPPebzBVKwd+9eyOX0ERwoGgYaJC699FIAQH19vd/b3333XUybNg16vR5qtRojR47E3//+d9jtdu8+QrfnG2+8AQDIz8/3dvlWVFR49/vuu+9w9dVXIz09HSqVCkOHDsUdd9zhDXK6mz9/PjiOQ3l5OZ599lmMHj0aiYmJmDp1qnefxsZGLFy4ECNHjkRiYiK0Wi1mzJiB0tLSgB//9u3bAQDXXHNNj0AFAJKTk3HFFVf4vW9paSmuvPJK7+PJzc3FnDlzsGnTJu8+nZ2dWLZsGS6//HIMHToUKpUKer0eM2bMwEcffRRwO3vrWhbyW9rb2/HAAw8gLy8PKpUKI0aMwFNPPQV/i6czxvDcc89h1KhRUKvVGDJkCO666y40NTX5zZch0fPKK68AAG699VbcdNNNqK+v75GHIrw3jh07huPHj3vfa8Lf7vHHHwcATJs2zWf4pbv29nY8+eSTGDt2LJKTk5GSkoLzzz8f7777bo82dc8J+eabbzBr1izo9foe7+9g3XrrrQCAnTt39rjNarXiT3/6E4qKiqBWq6HX63HppZfis88+6/OYX3/9NS655BJotVqkpqaipKQE3333XY/9hOfQX/t7y4E5cuQIbr31VgwfPtzbppEjR+K2225DQ0MDAGDq1Km45ZZbAAC33HKLz/MvnEu45txzzz1+H8Po0aMxcuTIHtsbGxvx5z//GWeddRaSkpKg1Wpx9tln409/+hPa2tq8++3evRv33HMPzj77bBgMBqjVahQWFuL+++9HY2Njn8/f6SorK3HXXXehoKAAKpUKaWlpuOqqq7Br164e+3Yffnzrrbdw3nnnITk5OeLXEwrrBgnhg3XChAk9bvvtb3+L119/Hbm5ubjmmmug1WrxzTff4NFHH8XmzZtRWloKhUKBsWPHYtGiRfjwww+xZ88e3HPPPdDpdADg/X/t2rW47rrrwHEcrr32WuTl5eG7777DSy+9hLVr12Lbtm0oKCjo0Ybf//732LZtG6644gpcfvnlSEhIAAAcP34cU6dORUVFBS6++GLMmjULra2t2LBhAy677DK89NJL+N3vftfv4zeZTACAQ4cOBfW8LVq0CH/961+RkpKCuXPnIjc3F9XV1d7u7UsuuQQAf4G55557MHnyZMycORMmkwk1NTVYu3YtZs+ejZdffjmgdvbF4XDg0ksvRXV1NWbNmgW5XI4PP/wQCxcuhM1m8354CRYsWIB///vfyM7Oxu9+9zsolUqsW7cOO3fuhMPhgEKhGFB7SGhOnTqFdevWYeTIkZgwYQISExOxdOlS/Oc//8EvfvEL735z587FsGHD8OyzzwKAN4ldeK99+OGH2Lp1K26++Wa/HxRWqxXTp0/HDz/8gPHjx+M3v/kN3G43Nm7ciBtvvBE///wz/v73v/e4344dO7B48WJcdNFF+O1vf4u6uroB9eq43W4A6NGLYLFYMHnyZBw4cAATJkzA1VdfDbPZjJUrV6KkpATLli3DnXfe2eN43377LZ588klccsklWLBgAY4cOYIPPvgAX375JUpLS3HRRReF3Nbq6mpMmDABLS0tuPzyy3Httdeio6MDx44dwzvvvIO7774baWlpmD9/PnQ6HdauXYs5c+b4DK0Lfx+TyYSTJ0/i0KFDfq+7/hw7dgzTpk3D8ePHMX78eNxxxx1wu904ePAgli5dittvvx3JyckA+IB3zZo1mDJlCi655BK4XC589913WLp0KT7++GPs2rULqamp/Z7z+++/x6WXXorGxkaUlJR4/w4ffvghLrzwQqxZswaXX355j/v94x//wKZNm3DllVdi+vTpsFqtAT3GkDESNwAwAGzRokXef/fddx+78MILGcdxbM6cOaylpcXnPm+88QYDwK699lpms9l8blu0aBEDwJYuXeqz/eabb2YA2LFjx3y2t7S0MIPBwBISEtj27dt9blu8eDEDwC655BK/x8rOzmbl5eU9HtOUKVMYx3Fs5cqVPtstFgs7++yzmVqtZjU1Nf0+N1VVVUyn0zEAbPbs2eytt95i+/fvZy6Xq9f7bNy4kQFgBQUFrLKy0uc2t9vNTp486f29o6PD53dBY2MjGzlyJNPr9ay9vd3ntqFDh7KhQ4f6bBP+Hm+88UaPfQGwWbNm+Rzn1KlTTKvVMo1Gwzo7O73bv/zySwaAFRUVMYvF4t1ut9vZRRddxAD0ODeJjieffJIBYE899ZR327hx4xjHcezo0aM99vf3OmGs6/35+eef+z2P8N76xz/+4bPdZrOxkpISxnEc+/77773bP//8c+815KWXXgrqMQmv25tvvrnHbb/73e8YAHbFFVf4bP+///s/BoDdcccdPtsPHDjAUlNTmUKh8LkmdG/f888/73OfDz/8kAFgI0aM8HlP93at6n68RYsWebc999xzfq95jDHW2trq897r7b0qePDBBxkAlp6ezhYtWsS2bNnCrFar330FkydPZgDY4sWLe9xWX1/vc42uqKhgTqezx34vvfQSA8CefPJJn+3+2utwONjw4cOZWq1mX331lc/+VVVVLDs7m2VkZPicV3jdJSUl+bx+Io2ClTgivJH9/Rs1ahR7++23e9xn7NixTKFQ+HygCZxOJ0tLS2Pnnnuuz/beLgBvv/02A8BuuummHsfq7Oz0fuBWVFT0OJa/i8OPP/7IALDrrrvO7+MVLlDLli3ze/vpvvjiCzZixAif5yU1NZXNmjWLvfvuuz0Cl9mzZzMA7IMPPgjo+L3
"text/plain": [
"<Figure size 600x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"draw_scaler(df1, preprocessing.RobustScaler(), \"RobustScaler\")"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAHYCAYAAABwVYPIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/H5lhTAAAACXBIWXMAAA9hAAAPYQGoP6dpAACS70lEQVR4nO3deXwTdf4/8NckmaT3AS200AIt0FpUwAsBEZCrgijgfaxar10FXY/1WHR3QXdF+el3QcVbFI8FFRAERKkcHoiIeAAql0BVjgKF3kcyk3x+f0xmmjRJc81Mjr6fj4fSJpPpJ8dM3vP+vD+fD8cYYyCEEEIIiQBDpBtACCGEkI6LAhFCCCGERAwFIoQQQgiJGApECCGEEBIxFIgQQgghJGIoECGEEEJIxFAgQgghhJCIoUCEEEIIIRFDgQghhBBCIoYCEeLV+++/j7POOgupqangOA733HNPpJsUc3r16oVevXq53bZgwQJwHIcFCxZEpE0k+giCgMceewxFRUWwWCzgOA7Lly+PdLN099lnn4HjOMycOTPSTYlJsXxuoUAkhnAc5/GfxWJBr169cOONN2Lnzp2q/J3Nmzfj6quvRm1tLe644w7MmDEDF154oSr71trixYtx4YUXokuXLuB5Hp07d0a/fv3wpz/9CW+++Wakm0c6kMcff1w5Tnfv3u1zuzlz5mDGjBnIzc3F/fffjxkzZuCUU07BzJkzwXEcPvvsM/0a7YP8Jeft3HP99dfjp59+inQTVRfIF/vhw4dx7733ol+/fkhKSkJiYiJ69OiBESNG4JFHHsG+ffv0a3AMM0W6ASR4M2bMUH6ura3Fli1b8NZbb2Hp0qXYuHEjBg4cGNb+V61aBcYY3nrrLQwdOjTM1urnz3/+M1599VUkJibioosuQkFBARobG7Fv3z4sW7YMn332GW688caItnHKlCkYPHgwcnNzI9oOoi3GGObPnw+O48AYw2uvvYannnrK67YrVqxASkoKPv30U5jNZp1bGpwBAwZg8uTJAIC6ujp89dVXeOedd7BkyRKsX78eQ4YMiWwDdbRjxw6MHDkSJ0+exOmnn44bb7wR6enp+P3337Ft2zbMmjULBQUF6N27d6SbGvUoEIlB3lKXd911F+bNm4e5c+eGnZo7fPgwAKBbt25h7UdPGzduxKuvvoq8vDx8/fXXyMvLc7u/sbExKq4s09PTkZ6eHulmEI2Vl5fjwIEDuO222/Dhhx/izTffxOOPP+410Dh8+DA6d+4c9UEIAAwcONDj/HP77bfj5ZdfxiOPPIL169dHpmERcO+99+LkyZOYOXOm28WhbMeOHTCZ6Cs2ENQ1EyfGjRsHADh+/LjX+xctWoQLLrgAmZmZSEhIQElJCf7zn//AarUq28ipyDfeeAMAUFBQoKRhKyoqlO22bt2KSy+9FF26dIHFYkHPnj1xxx13KAGMq7KyMnAch/3792Pu3Lk4/fTTkZiYiJEjRyrbnDx5EtOnT0dJSQkSExORnp6O0aNHo7y8PODn/9VXXwEALrvsMo8gBACSk5Nx0UUXeX1seXk5Lr74YuX55OfnY9KkSVi7dq2yjc1mw7x58zBhwgT07NkTFosFmZmZGD16ND766KOA2+kr3SvXkzQ1NeGBBx5Ajx49YLFY0KdPHzz55JPwtkg2YwzPPPMM+vXrh4SEBHTv3h133nknamtrvdanEP28+uqrAIBbb70V1113HY4fP+5R9yEfGwcOHMBvv/2mHGvye/foo48CAC644AK3LhFXTU1NeOKJJzBw4EAkJycjJSUFQ4YMwaJFizza5FqDsXnzZowfPx6ZmZkex3ewbr31VgDAli1bPO6rqanB3//+dxQVFSEhIQGZmZkYN24cPv3003b3+fXXX2PMmDFIT09HamoqSktLsXXrVo/t5NfQW/t91Zz8+uuvuPXWW9G7d2+lTSUlJfjLX/6CEydOAABGjhyJm266CQBw0003ub3+8t+Szzl333231+dw+umno6SkxOP2kydP4pFHHsFpp52GpKQkpKenY8CAAfj73/+OxsZGZbvvvvsOd999NwYMGIBOnTohISEBffv2xX333YeTJ0+2+/q1dfDgQdx5550oLCyExWJB586dcckll+Dbb7/12Na1S/Ctt97COeecg+TkZE3PJxSuxQn5S3PQoEEe991yyy14/fXXkZ+fj8suuwzp6enYvHkz/vnPf2LdunUoLy8Hz/MYOHAgZsyYgeXLl2Pbtm24++67kZGRAQDKvx9++CGuuOIKcByHyy+/HD169MDWrVvx0ksv4cMPP8TGjRtRWFjo0Ya//vWv2LhxIy666CJMmDABRqMRAPDbb79h5MiRqKiowPDhwzF+/Hg0NDRg1apVuPDCC/HSSy/hz3/+s9/nn52dDQDYs2dPUK/bjBkz8NhjjyElJQWTJ09Gfn4+Dh8+rKScx4wZA0A6edx9990YOnQoxo4di+zsbBw5cgQffvghJk6ciJdffjmgdrZHEASMGzcOhw8fxvjx42EymbB8+XJMnz4dzc3NyheTbNq0aXjxxRfRrVs3/PnPf4bZbMaKFSuwZcsWCIIAnufDag8JzdGjR7FixQqUlJRg0KBBSExMxJw5c/DKK6/gyiuvVLabPHkyevXqhblz5wKAUhAuH2vLly/H559/jhtvvNHrl0BNTQ1GjRqFH374AWeddRZuvvlmOBwOrFmzBtdeey1+/vln/Oc///F43KZNmzBr1iycf/75uOWWW3Ds2LGwsjEOhwMAPK7+q6urMXToUOzatQuDBg3CpZdeiqqqKrz//vsoLS3FvHnzMHXqVI/9ffPNN3jiiScwZswYTJs2Db/++is++OADfPHFFygvL8f5558fclsPHz6MQYMGob6+HhMmTMDll1+OlpYWHDhwAO+88w7uuusudO7cGWVlZcjIyMCHH36ISZMmuXV3y+9PdnY2/vjjD+zZs8fredebAwcO4IILLsBvv/2Gs846C3fccQccDgd2796NOXPm4Pbbb0dycjIAKZhdtmwZRowYgTFjxsBut2Pr1q2YM2cOVq9ejW+//Rapqal+/+b333+PcePG4eTJkygtLVXeh+XLl2PYsGFYtmwZJkyY4PG4p59+GmvXrsXFF1+MUaNGoaamJqDnGBJGYgYABoDNmDFD+e/ee+9lw4YNYxzHsUmTJrH6+nq3x7zxxhsMALv88stZc3Oz230zZsxgANicOXPcbr/xxhsZAHbgwAG32+vr61mnTp2Y0WhkX331ldt9s2bNYgDYmDFjvO6rW7dubP/+/R7PacSIEYzjOPb++++73V5dXc0GDBjAEhIS2JEjR/y+NocOHWIZGRkMAJs4cSJ766232M6dO5ndbvf5mDVr1jAArLCwkB08eNDtPofDwf744w/l95aWFrffZSdPnmQlJSUsMzOTNTU1ud3Xs2dP1rNnT7fb5PfjjTfe8NgWABs/frzbfo4ePcrS09NZWloas9lsyu1ffPEFA8CKiopYdXW1crvVamXnn38+A+Dxt4k+nnjiCQaAPfnkk8ptZ5xxBuM4ju3bt89je2+fE8Zaj88NGzZ4/TvysfX000+73d7c3MxKS0sZx3Hs+++/V27fsGGDcg556aWXgnpO8uf2xhtv9Ljvz3/+MwPALrroIrfbb7vtNgaA3XHHHW6379q1i6WmpjKe593OCa7te+6559wes3z5cgaA9enTx+2Y9nWuct3fjBkzlNueeeYZr+c8xhhraGhwO/Z8HauyBx98kAFgXbp0YTNmzGDr169nNTU1XreVDR06lAFgs2bN8rjv+PHjbufoiooKJoqix3YvvfQSA8CeeOIJt9u9tVcQBNa7d2+WkJDAvvzyS7ftDx06xLp168a6du3q9nflz11SUpLb50dLFIjEEPkg9fZfv3792Ntvv+3xmIEDBzKe592+rGSiKLLOnTuzs88+2+12Xwf322+/zQCw6667zmNfNptN+TKtqKjw2Je3A//HH39kANgVV1zh9fnKJ5958+Z5vb+tzz77jPXp08ftdUlNTWXjx49nixYt8ghKJk6cyACwDz74IKD9+/L0008zAOz
"text/plain": [
"<Figure size 600x500 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"draw_scaler(df2, preprocessing.RobustScaler(), \"RobustScaler\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# References\n",
"* [Preprocessing Data](https://scikit-learn.org/stable/modules/preprocessing.html) Scikit-Learn\n",
"* [Cleaning and Prepping Data with Python for Data Science — Best Practices and Helpful Packages](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3), DeFilippi, 2019, \n",
"* [Data Preprocessing for Machine learning in Python, GeeksForGeeks](https://www.geeksforgeeks.org/data-preprocessing-machine-learning-python/)\n",
"* [Scale, Standardize, or Normalize with Scikit-Learn](https://towardsdatascience.com/scale-standardize-or-normalize-with-scikit-learn-6ccc7d176a02), J. Hales.\n",
"* [Scale, Standardize, and Normalize Data\n",
"](http://blog.pengyifan.com/scale-standardize-and-normalize-data/), Y. Peng, 2013.\n",
"* [Feature Scaling with scikit-learn](http://benalexkeen.com/feature-scaling-with-scikit-learn/), A. Keen, 2017"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"datacleaner": {
"position": {
"top": "50px"
},
"python": {
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
},
"window_display": false
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2024-04-04 16:27:48 +00:00
"version": "3.11.7"
2024-04-03 20:50:36 +00:00
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}