![](files/images/EscUpmPolit_p.gif "UPM")

# Course Notes for Learning Intelligent Systems

Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, ©  Carlos A. Iglesias

## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)

# Table of Contents
* [Reading Data](#Reading-Data)
* [Iris flower dataset](#Iris-flower-dataset)
* [References](#References)

# Reading Data

The goal of this notebook is to learn how to read and load a sample dataset.

Scikit-learn comes with some bundled [datasets](https://scikit-learn.org/stable/datasets.html): iris, digits, boston, etc.

In this notebook we are going to use the Iris dataset.

## Iris flower dataset

The [Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), available at [UCI dataset repository](https://archive.ics.uci.edu/ml/datasets/Iris), is a classic dataset for classification.

The dataset consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. Based on the combination of these four features, a machine learning model will learn to differentiate the species of Iris.

![Iris](files/images/iris-dataset.jpg)

In order to read the dataset, we import the datasets bundle and then load the Iris dataset. 

In [None]:
# import datasets from scikit-learn
from sklearn import datasets

# load iris dataset
iris = datasets.load_iris()

A dataset is a dictionary-like object that holds all the data and some metadata about the data. This data is stored in the `.data` member, which is a 2D (`n_samples`, `n_features`) array. In the case of supervised problem, one or more response variables are stored in the `.target` member.

In [None]:
#type 'bunch' of a dataset
type(iris)

In [None]:
# print descrition of the dataset
print(iris.DESCR)

In [None]:
# names of the features (attributes of the entities)
print(iris.feature_names)

In [None]:
#names of the targets(classes of the classifier)
print(iris.target_names)

In [None]:
#type numpy array
type(iris.data)

Now we are going to inspect the dataset. You can consult the NumPy tutorial listed in the references.

In [None]:
#Data in the iris dataset. The value of the features of the samples.
print(iris.data)

In [None]:
# Target.  Category of every sample
print(iris.target)

In [None]:
# Iris data is a numpy array
# We can inspect its shape (rows, columns). In our case, (n_samples, n_features)
print(iris.data.shape)

In [None]:
#Using numpy, I can print the dimensions (here we are working with 2D matriz)
print(iris.data.ndim)

In [None]:
# I can print n_samples
print(iris.data.shape[0])

In [None]:
# ... n_features
print(iris.data.shape[1])

In [None]:
# names of the features
print(iris.feature_names)

In following sessions we will learn how to load a dataset from a file (csv, excel, ...) using the pandas library.

## References

* [Iris flower data set](https://en.wikipedia.org/wiki/Iris_flower_data_set)
* [How to load an example dataset with scikit-learn](http://scikit-learn.org/stable/tutorial/basic/tutorial.html#loading-example-dataset)
* [Dataset loading utilities in scikit-learn](http://scikit-learn.org/stable/datasets/)
* [How to plot the Iris dataset](http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html)
* [An introduction to NumPy and Scipy](http://www.engr.ucsb.edu/~shell/che210d/numpy.pdf)
* [NumPy tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)

## Licence

The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  

©  Carlos A. Iglesias, Universidad Politécnica de Madrid.