mirror of
https://github.com/gsi-upm/sitc
synced 2024-11-05 07:31:41 +00:00
634 lines
18 KiB
Plaintext
634 lines
18 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Course Notes for Learning Intelligent Systems"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Table of Contents\n",
|
||
|
"* [Reading Data](#Reading-Data)\n",
|
||
|
"* [Iris flower dataset](#Iris-flower-dataset)\n",
|
||
|
"* [References](#References)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Reading Data"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The goal of this notebook is to learn how to read and load a sample dataset.\n",
|
||
|
"\n",
|
||
|
"Scikit-learn come with some bundled [datasets](http://scikit-learn.org/stable/datasets/): iris, digits, boston, etc.\n",
|
||
|
"\n",
|
||
|
"In this notebook we are going to use the Iris dataset."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Iris flower dataset"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The [Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), available at [UCI dataset repository](https://archive.ics.uci.edu/ml/datasets/Iris), is a classic dataset for classification.\n",
|
||
|
"\n",
|
||
|
"The dataset consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. Based on the combination of these four features.\n",
|
||
|
"\n",
|
||
|
"![Iris](files/images/iris-dataset.jpg)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"In ordert to read the dataset, we import the bundle datasets and then load the Iris dataset. "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# import datasets from scikit-learn\n",
|
||
|
"from sklearn import datasets\n",
|
||
|
"\n",
|
||
|
"# load iris dataset\n",
|
||
|
"iris = datasets.load_iris()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"A dataset is a dictionary-like object that holds all the data and some metadata about the data. This data is stored in the `.data` member, which is a 2D (`n_samples`, `n_features`) array. In the case of supervised problem, one or more response variables are stored in the `.target` member."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 2,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"sklearn.datasets.base.Bunch"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 2,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"#type 'bunch' of a dataset\n",
|
||
|
"type(iris)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 4,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Iris Plants Database\n",
|
||
|
"\n",
|
||
|
"Notes\n",
|
||
|
"-----\n",
|
||
|
"Data Set Characteristics:\n",
|
||
|
" :Number of Instances: 150 (50 in each of three classes)\n",
|
||
|
" :Number of Attributes: 4 numeric, predictive attributes and the class\n",
|
||
|
" :Attribute Information:\n",
|
||
|
" - sepal length in cm\n",
|
||
|
" - sepal width in cm\n",
|
||
|
" - petal length in cm\n",
|
||
|
" - petal width in cm\n",
|
||
|
" - class:\n",
|
||
|
" - Iris-Setosa\n",
|
||
|
" - Iris-Versicolour\n",
|
||
|
" - Iris-Virginica\n",
|
||
|
" :Summary Statistics:\n",
|
||
|
"\n",
|
||
|
" ============== ==== ==== ======= ===== ====================\n",
|
||
|
" Min Max Mean SD Class Correlation\n",
|
||
|
" ============== ==== ==== ======= ===== ====================\n",
|
||
|
" sepal length: 4.3 7.9 5.84 0.83 0.7826\n",
|
||
|
" sepal width: 2.0 4.4 3.05 0.43 -0.4194\n",
|
||
|
" petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)\n",
|
||
|
" petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)\n",
|
||
|
" ============== ==== ==== ======= ===== ====================\n",
|
||
|
"\n",
|
||
|
" :Missing Attribute Values: None\n",
|
||
|
" :Class Distribution: 33.3% for each of 3 classes.\n",
|
||
|
" :Creator: R.A. Fisher\n",
|
||
|
" :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)\n",
|
||
|
" :Date: July, 1988\n",
|
||
|
"\n",
|
||
|
"This is a copy of UCI ML iris datasets.\n",
|
||
|
"http://archive.ics.uci.edu/ml/datasets/Iris\n",
|
||
|
"\n",
|
||
|
"The famous Iris database, first used by Sir R.A Fisher\n",
|
||
|
"\n",
|
||
|
"This is perhaps the best known database to be found in the\n",
|
||
|
"pattern recognition literature. Fisher's paper is a classic in the field and\n",
|
||
|
"is referenced frequently to this day. (See Duda & Hart, for example.) The\n",
|
||
|
"data set contains 3 classes of 50 instances each, where each class refers to a\n",
|
||
|
"type of iris plant. One class is linearly separable from the other 2; the\n",
|
||
|
"latter are NOT linearly separable from each other.\n",
|
||
|
"\n",
|
||
|
"References\n",
|
||
|
"----------\n",
|
||
|
" - Fisher,R.A. \"The use of multiple measurements in taxonomic problems\"\n",
|
||
|
" Annual Eugenics, 7, Part II, 179-188 (1936); also in \"Contributions to\n",
|
||
|
" Mathematical Statistics\" (John Wiley, NY, 1950).\n",
|
||
|
" - Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.\n",
|
||
|
" (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.\n",
|
||
|
" - Dasarathy, B.V. (1980) \"Nosing Around the Neighborhood: A New System\n",
|
||
|
" Structure and Classification Rule for Recognition in Partially Exposed\n",
|
||
|
" Environments\". IEEE Transactions on Pattern Analysis and Machine\n",
|
||
|
" Intelligence, Vol. PAMI-2, No. 1, 67-71.\n",
|
||
|
" - Gates, G.W. (1972) \"The Reduced Nearest Neighbor Rule\". IEEE Transactions\n",
|
||
|
" on Information Theory, May 1972, 431-433.\n",
|
||
|
" - See also: 1988 MLC Proceedings, 54-64. Cheeseman et al\"s AUTOCLASS II\n",
|
||
|
" conceptual clustering system finds 3 classes in the data.\n",
|
||
|
" - Many, many more ...\n",
|
||
|
"\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# print descrition of the dataset\n",
|
||
|
"print (iris.DESCR)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 35,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# names of the features (attributes of the entities)\n",
|
||
|
"print(iris.feature_names)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 36,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"['setosa' 'versicolor' 'virginica']\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"#names of the targets(classes of the classifier)\n",
|
||
|
"print(iris.target_names)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 33,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"numpy.ndarray"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 33,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"#type numpy array\n",
|
||
|
"type(iris.data)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Now we are going to inspect the dataset. You can consult the NumPy tutorial listed in the references."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 37,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"[[ 5.1 3.5 1.4 0.2]\n",
|
||
|
" [ 4.9 3. 1.4 0.2]\n",
|
||
|
" [ 4.7 3.2 1.3 0.2]\n",
|
||
|
" [ 4.6 3.1 1.5 0.2]\n",
|
||
|
" [ 5. 3.6 1.4 0.2]\n",
|
||
|
" [ 5.4 3.9 1.7 0.4]\n",
|
||
|
" [ 4.6 3.4 1.4 0.3]\n",
|
||
|
" [ 5. 3.4 1.5 0.2]\n",
|
||
|
" [ 4.4 2.9 1.4 0.2]\n",
|
||
|
" [ 4.9 3.1 1.5 0.1]\n",
|
||
|
" [ 5.4 3.7 1.5 0.2]\n",
|
||
|
" [ 4.8 3.4 1.6 0.2]\n",
|
||
|
" [ 4.8 3. 1.4 0.1]\n",
|
||
|
" [ 4.3 3. 1.1 0.1]\n",
|
||
|
" [ 5.8 4. 1.2 0.2]\n",
|
||
|
" [ 5.7 4.4 1.5 0.4]\n",
|
||
|
" [ 5.4 3.9 1.3 0.4]\n",
|
||
|
" [ 5.1 3.5 1.4 0.3]\n",
|
||
|
" [ 5.7 3.8 1.7 0.3]\n",
|
||
|
" [ 5.1 3.8 1.5 0.3]\n",
|
||
|
" [ 5.4 3.4 1.7 0.2]\n",
|
||
|
" [ 5.1 3.7 1.5 0.4]\n",
|
||
|
" [ 4.6 3.6 1. 0.2]\n",
|
||
|
" [ 5.1 3.3 1.7 0.5]\n",
|
||
|
" [ 4.8 3.4 1.9 0.2]\n",
|
||
|
" [ 5. 3. 1.6 0.2]\n",
|
||
|
" [ 5. 3.4 1.6 0.4]\n",
|
||
|
" [ 5.2 3.5 1.5 0.2]\n",
|
||
|
" [ 5.2 3.4 1.4 0.2]\n",
|
||
|
" [ 4.7 3.2 1.6 0.2]\n",
|
||
|
" [ 4.8 3.1 1.6 0.2]\n",
|
||
|
" [ 5.4 3.4 1.5 0.4]\n",
|
||
|
" [ 5.2 4.1 1.5 0.1]\n",
|
||
|
" [ 5.5 4.2 1.4 0.2]\n",
|
||
|
" [ 4.9 3.1 1.5 0.1]\n",
|
||
|
" [ 5. 3.2 1.2 0.2]\n",
|
||
|
" [ 5.5 3.5 1.3 0.2]\n",
|
||
|
" [ 4.9 3.1 1.5 0.1]\n",
|
||
|
" [ 4.4 3. 1.3 0.2]\n",
|
||
|
" [ 5.1 3.4 1.5 0.2]\n",
|
||
|
" [ 5. 3.5 1.3 0.3]\n",
|
||
|
" [ 4.5 2.3 1.3 0.3]\n",
|
||
|
" [ 4.4 3.2 1.3 0.2]\n",
|
||
|
" [ 5. 3.5 1.6 0.6]\n",
|
||
|
" [ 5.1 3.8 1.9 0.4]\n",
|
||
|
" [ 4.8 3. 1.4 0.3]\n",
|
||
|
" [ 5.1 3.8 1.6 0.2]\n",
|
||
|
" [ 4.6 3.2 1.4 0.2]\n",
|
||
|
" [ 5.3 3.7 1.5 0.2]\n",
|
||
|
" [ 5. 3.3 1.4 0.2]\n",
|
||
|
" [ 7. 3.2 4.7 1.4]\n",
|
||
|
" [ 6.4 3.2 4.5 1.5]\n",
|
||
|
" [ 6.9 3.1 4.9 1.5]\n",
|
||
|
" [ 5.5 2.3 4. 1.3]\n",
|
||
|
" [ 6.5 2.8 4.6 1.5]\n",
|
||
|
" [ 5.7 2.8 4.5 1.3]\n",
|
||
|
" [ 6.3 3.3 4.7 1.6]\n",
|
||
|
" [ 4.9 2.4 3.3 1. ]\n",
|
||
|
" [ 6.6 2.9 4.6 1.3]\n",
|
||
|
" [ 5.2 2.7 3.9 1.4]\n",
|
||
|
" [ 5. 2. 3.5 1. ]\n",
|
||
|
" [ 5.9 3. 4.2 1.5]\n",
|
||
|
" [ 6. 2.2 4. 1. ]\n",
|
||
|
" [ 6.1 2.9 4.7 1.4]\n",
|
||
|
" [ 5.6 2.9 3.6 1.3]\n",
|
||
|
" [ 6.7 3.1 4.4 1.4]\n",
|
||
|
" [ 5.6 3. 4.5 1.5]\n",
|
||
|
" [ 5.8 2.7 4.1 1. ]\n",
|
||
|
" [ 6.2 2.2 4.5 1.5]\n",
|
||
|
" [ 5.6 2.5 3.9 1.1]\n",
|
||
|
" [ 5.9 3.2 4.8 1.8]\n",
|
||
|
" [ 6.1 2.8 4. 1.3]\n",
|
||
|
" [ 6.3 2.5 4.9 1.5]\n",
|
||
|
" [ 6.1 2.8 4.7 1.2]\n",
|
||
|
" [ 6.4 2.9 4.3 1.3]\n",
|
||
|
" [ 6.6 3. 4.4 1.4]\n",
|
||
|
" [ 6.8 2.8 4.8 1.4]\n",
|
||
|
" [ 6.7 3. 5. 1.7]\n",
|
||
|
" [ 6. 2.9 4.5 1.5]\n",
|
||
|
" [ 5.7 2.6 3.5 1. ]\n",
|
||
|
" [ 5.5 2.4 3.8 1.1]\n",
|
||
|
" [ 5.5 2.4 3.7 1. ]\n",
|
||
|
" [ 5.8 2.7 3.9 1.2]\n",
|
||
|
" [ 6. 2.7 5.1 1.6]\n",
|
||
|
" [ 5.4 3. 4.5 1.5]\n",
|
||
|
" [ 6. 3.4 4.5 1.6]\n",
|
||
|
" [ 6.7 3.1 4.7 1.5]\n",
|
||
|
" [ 6.3 2.3 4.4 1.3]\n",
|
||
|
" [ 5.6 3. 4.1 1.3]\n",
|
||
|
" [ 5.5 2.5 4. 1.3]\n",
|
||
|
" [ 5.5 2.6 4.4 1.2]\n",
|
||
|
" [ 6.1 3. 4.6 1.4]\n",
|
||
|
" [ 5.8 2.6 4. 1.2]\n",
|
||
|
" [ 5. 2.3 3.3 1. ]\n",
|
||
|
" [ 5.6 2.7 4.2 1.3]\n",
|
||
|
" [ 5.7 3. 4.2 1.2]\n",
|
||
|
" [ 5.7 2.9 4.2 1.3]\n",
|
||
|
" [ 6.2 2.9 4.3 1.3]\n",
|
||
|
" [ 5.1 2.5 3. 1.1]\n",
|
||
|
" [ 5.7 2.8 4.1 1.3]\n",
|
||
|
" [ 6.3 3.3 6. 2.5]\n",
|
||
|
" [ 5.8 2.7 5.1 1.9]\n",
|
||
|
" [ 7.1 3. 5.9 2.1]\n",
|
||
|
" [ 6.3 2.9 5.6 1.8]\n",
|
||
|
" [ 6.5 3. 5.8 2.2]\n",
|
||
|
" [ 7.6 3. 6.6 2.1]\n",
|
||
|
" [ 4.9 2.5 4.5 1.7]\n",
|
||
|
" [ 7.3 2.9 6.3 1.8]\n",
|
||
|
" [ 6.7 2.5 5.8 1.8]\n",
|
||
|
" [ 7.2 3.6 6.1 2.5]\n",
|
||
|
" [ 6.5 3.2 5.1 2. ]\n",
|
||
|
" [ 6.4 2.7 5.3 1.9]\n",
|
||
|
" [ 6.8 3. 5.5 2.1]\n",
|
||
|
" [ 5.7 2.5 5. 2. ]\n",
|
||
|
" [ 5.8 2.8 5.1 2.4]\n",
|
||
|
" [ 6.4 3.2 5.3 2.3]\n",
|
||
|
" [ 6.5 3. 5.5 1.8]\n",
|
||
|
" [ 7.7 3.8 6.7 2.2]\n",
|
||
|
" [ 7.7 2.6 6.9 2.3]\n",
|
||
|
" [ 6. 2.2 5. 1.5]\n",
|
||
|
" [ 6.9 3.2 5.7 2.3]\n",
|
||
|
" [ 5.6 2.8 4.9 2. ]\n",
|
||
|
" [ 7.7 2.8 6.7 2. ]\n",
|
||
|
" [ 6.3 2.7 4.9 1.8]\n",
|
||
|
" [ 6.7 3.3 5.7 2.1]\n",
|
||
|
" [ 7.2 3.2 6. 1.8]\n",
|
||
|
" [ 6.2 2.8 4.8 1.8]\n",
|
||
|
" [ 6.1 3. 4.9 1.8]\n",
|
||
|
" [ 6.4 2.8 5.6 2.1]\n",
|
||
|
" [ 7.2 3. 5.8 1.6]\n",
|
||
|
" [ 7.4 2.8 6.1 1.9]\n",
|
||
|
" [ 7.9 3.8 6.4 2. ]\n",
|
||
|
" [ 6.4 2.8 5.6 2.2]\n",
|
||
|
" [ 6.3 2.8 5.1 1.5]\n",
|
||
|
" [ 6.1 2.6 5.6 1.4]\n",
|
||
|
" [ 7.7 3. 6.1 2.3]\n",
|
||
|
" [ 6.3 3.4 5.6 2.4]\n",
|
||
|
" [ 6.4 3.1 5.5 1.8]\n",
|
||
|
" [ 6. 3. 4.8 1.8]\n",
|
||
|
" [ 6.9 3.1 5.4 2.1]\n",
|
||
|
" [ 6.7 3.1 5.6 2.4]\n",
|
||
|
" [ 6.9 3.1 5.1 2.3]\n",
|
||
|
" [ 5.8 2.7 5.1 1.9]\n",
|
||
|
" [ 6.8 3.2 5.9 2.3]\n",
|
||
|
" [ 6.7 3.3 5.7 2.5]\n",
|
||
|
" [ 6.7 3. 5.2 2.3]\n",
|
||
|
" [ 6.3 2.5 5. 1.9]\n",
|
||
|
" [ 6.5 3. 5.2 2. ]\n",
|
||
|
" [ 6.2 3.4 5.4 2.3]\n",
|
||
|
" [ 5.9 3. 5.1 1.8]]\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"#Data in the iris dataset. The value of the features of the samples.\n",
|
||
|
"print(iris.data)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 15,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
|
||
|
" 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n",
|
||
|
" 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2\n",
|
||
|
" 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2\n",
|
||
|
" 2 2]\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Target. Category of every sample\n",
|
||
|
"print(iris.target)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 20,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"(150, 4)\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Iris data is a numpy array\n",
|
||
|
"# We can inspect its shape (rows, columns). In our case, (n_samples, n_features)\n",
|
||
|
"print(iris.data.shape)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 22,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"2\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"#Using numpy, I can print the dimensions (here we are working with 2D matriz)\n",
|
||
|
"print(iris.data.ndim)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 27,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"150\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# I can print n_samples\n",
|
||
|
"print(iris.data.shape[0])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 28,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"4\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# ... n_features\n",
|
||
|
"print(iris.data.shape[1])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 31,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# names of the features\n",
|
||
|
"print(iris.feature_names)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"In another session, we will learn how to load a dataset from a file (csv, excel, ...). We will use the library pandas for this purpose."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## References"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"* [Iris flower data set](https://en.wikipedia.org/wiki/Iris_flower_data_set)\n",
|
||
|
"* [How to load an example dataset with scikit-learn](http://scikit-learn.org/stable/tutorial/basic/tutorial.html#loading-example-dataset)\n",
|
||
|
"* [Dataset loading utilities in scikit-learn](http://scikit-learn.org/stable/datasets/)\n",
|
||
|
"* [How to plot the Iris dataset](http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html)\n",
|
||
|
"* [An introduction to NumPy and Scipy](http://www.engr.ucsb.edu/~shell/che210d/numpy.pdf)\n",
|
||
|
"* [NumPy tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Licence\n",
|
||
|
"\n",
|
||
|
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||
|
"\n",
|
||
|
"© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.5.1"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 0
|
||
|
}
|