"This notebook introduces another plotting library, [**seaborn**](https://stanford.edu/~mwaskom/software/seaborn/), which provides advanced facilities for data visualization.\n",
"*Seaborn* is a library for making attractive and informative statistical graphics in Python. It is built on top of *matplotlib* and tightly integrated with the *PyData* stack, including support for *numpy* and *pandas* data structures and statistical routines from *scipy* and *statsmodels*.\n",
"You should install the SeaBorn package. Use `conda install seaborn`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Transform Data into Dataframe"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Seaborn* requires that data is represented as a *DataFrame* object from the library *pandas*. \n",
"\n",
"A *DataFrame* is a 2-dimensional labeled data structure with columns of potentially different types. We will not go into the details of DataFrames in this session."
"The following examples are taken from [a kaggle tutorial](https://www.kaggle.com/benhamner/d/uciml/iris/python-data-visualizations/notebook) and [the seaborn tutorial](https://stanford.edu/~mwaskom/software/seaborn/tutorial/axis_grids.html).\n",
"**PairGrid** allows you to quickly draw a grid of small subplots using the same plot type to visualize data in each. In a PairGrid, each row and column is assigned to a different variable, so the resulting plot shows each pairwise relationship in the dataset. This style of plot is sometimes called a “scatterplot matrix”, as this is the most common way to show each relationship"
"A very common way to use this plot colors the observations by a separate categorical variable. For example, the iris dataset has four measurements for each of the three different species of iris flowers.\n",
"PairGrid is flexible, but to take a quick look at a dataset, it may be easier to use pairplot(). This function uses scatterplots and histograms by default, although a few other kinds will be added (currently, you can also plot regression plots on the off-diagonals and KDEs on the diagonal)."
"[**Box plots** or **boxplot** ](https://en.wikipedia.org/wiki/Box_plot) (*diagramas de caja*) are a convenient way of graphically depicting groups of numerical data through their quartiles."
"[**Violin plots**](https://en.wikipedia.org/wiki/Violin_plot) (*diagramas de violín*) are a method of plotting numeric data. A violin plot is a box plot with a rotated kernel density plot on each side. A violin plot is just a histogram (or more often a smoothed variant like a kernel density) turned on its side and mirrored."
"Another useful representation is the [Kernel density estimation (KDE)](https://en.wikipedia.org/wiki/Kernel_density_estimation) plot. KDE is a non-parametric way to estimate the probability density function of a random variable. The kdeplot represents the shape of a distribution. Like the histogram, the KDE plots encodes the density of observations on one axis with height along the other axis:"
"Depending on the data, we can choose which visualisation suits better. the following [diagram](http://www.labnol.org/software/find-right-chart-type-for-your-data/6523/) guides this selection.\n",
"* [Tutorial plotting with Seaborn](https://stanford.edu/~mwaskom/software/seaborn/tutorial/axis_grids.html)\n",
"* [Choose the Right Chart Type for your Data](http://www.labnol.org/software/find-right-chart-type-for-your-data/6523/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",