mirror of
				https://github.com/gsi-upm/sitc
				synced 2025-10-30 15:08:19 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			303 lines
		
	
	
		
			7.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			303 lines
		
	
	
		
			7.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| {
 | |
|  "cells": [
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "\n",
 | |
|     "\n",
 | |
|     "# Course Notes for Learning Intelligent Systems\n",
 | |
|     "\n",
 | |
|     "Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, ©  Carlos A. Iglesias\n",
 | |
|     "\n",
 | |
|     "## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "# Table of Contents\n",
 | |
|     "* [Reading Data](#Reading-Data)\n",
 | |
|     "* [Iris flower dataset](#Iris-flower-dataset)\n",
 | |
|     "* [References](#References)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "# Reading Data"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "This notebook aims to learn how to read and load a sample dataset.\n",
 | |
|     "\n",
 | |
|     "Scikit-learn comes with some bundled [datasets](https://scikit-learn.org/stable/datasets.html): iris, digits, boston, etc.\n",
 | |
|     "\n",
 | |
|     "In this notebook, we will use the Iris dataset."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Iris flower dataset"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "The [Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), available at [UCI dataset repository](https://archive.ics.uci.edu/ml/datasets/Iris), is a classic dataset for classification.\n",
 | |
|     "\n",
 | |
|     "The dataset consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica, and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. Based on the combination of these four features, a machine learning model will learn to differentiate the species of Iris.\n",
 | |
|     "\n",
 | |
|      ""
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "Here you can see the species and the features.\n",
 | |
|     "\n",
 | |
|     ""
 | |
|    ]
 | |
|   },
 | |
|    {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "To read the dataset, we import the datasets bundle and then load the Iris dataset. "
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# import datasets from scikit-learn\n",
 | |
|     "from sklearn import datasets\n",
 | |
|     "\n",
 | |
|     "# load iris dataset\n",
 | |
|     "iris = datasets.load_iris()"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "A dataset is a dictionary-like object that holds all the data and some metadata about the data. This data is stored in the `.data` member, which is a 2D (`n_samples`, `n_features`) array. In the case of supervised problem, one or more response variables are stored in the `.target` member."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "#type 'bunch' of a dataset\n",
 | |
|     "type(iris)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# print descrition of the dataset\n",
 | |
|     "print(iris.DESCR)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# names of the features (attributes of the entities)\n",
 | |
|     "print(iris.feature_names)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "#names of the targets(classes of the classifier)\n",
 | |
|     "print(iris.target_names)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "#type numpy array\n",
 | |
|     "type(iris.data)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "Now we are going to inspect the dataset. You can consult the NumPy tutorial listed in the references."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "#Data in the iris dataset. The value of the features of the samples.\n",
 | |
|     "print(iris.data)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# Target.  Category of every sample\n",
 | |
|     "print(iris.target)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# Iris data is a numpy array\n",
 | |
|     "# We can inspect its shape (rows, columns). In our case, (n_samples, n_features)\n",
 | |
|     "print(iris.data.shape)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "#Using numpy, I can print the dimensions (here we are working with a 2D matrix)\n",
 | |
|     "print(iris.data.ndim)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# I can print n_samples\n",
 | |
|     "print(iris.data.shape[0])"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# ... n_features\n",
 | |
|     "print(iris.data.shape[1])"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# names of the features\n",
 | |
|     "print(iris.feature_names)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "In the following sessions, we will learn how to load a dataset from a file (CSV, Excel, ...) using the pandas library."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## References"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "* [Iris flower data set](https://en.wikipedia.org/wiki/Iris_flower_data_set)\n",
 | |
|     "* [How to load an example dataset with scikit-learn](http://scikit-learn.org/stable/tutorial/basic/tutorial.html#loading-example-dataset)\n",
 | |
|     "* [Dataset loading utilities in scikit-learn](http://scikit-learn.org/stable/datasets/)\n",
 | |
|     "* [How to plot the Iris dataset](http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html)\n",
 | |
|     "* [An introduction to NumPy and Scipy](http://www.engr.ucsb.edu/~shell/che210d/numpy.pdf)\n",
 | |
|     "* [NumPy tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Licence\n",
 | |
|     "\n",
 | |
|     "The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/).  \n",
 | |
|     "\n",
 | |
|     "©  Carlos A. Iglesias, Universidad Politécnica de Madrid."
 | |
|    ]
 | |
|   }
 | |
|  ],
 | |
|  "metadata": {
 | |
|   "kernelspec": {
 | |
|    "display_name": "Python 3",
 | |
|    "language": "python",
 | |
|    "name": "python3"
 | |
|   },
 | |
|   "language_info": {
 | |
|    "codemirror_mode": {
 | |
|     "name": "ipython",
 | |
|     "version": 3
 | |
|    },
 | |
|    "file_extension": ".py",
 | |
|    "mimetype": "text/x-python",
 | |
|    "name": "python",
 | |
|    "nbconvert_exporter": "python",
 | |
|    "pygments_lexer": "ipython3",
 | |
|    "version": "3.5.5"
 | |
|   },
 | |
|   "latex_envs": {
 | |
|    "LaTeX_envs_menu_present": true,
 | |
|    "autocomplete": true,
 | |
|    "bibliofile": "biblio.bib",
 | |
|    "cite_by": "apalike",
 | |
|    "current_citInitial": 1,
 | |
|    "eqLabelWithNumbers": true,
 | |
|    "eqNumInitial": 1,
 | |
|    "hotkeys": {
 | |
|     "equation": "Ctrl-E",
 | |
|     "itemize": "Ctrl-I"
 | |
|    },
 | |
|    "labels_anchors": false,
 | |
|    "latex_user_defs": false,
 | |
|    "report_style_numbering": false,
 | |
|    "user_envs_cfg": false
 | |
|   }
 | |
|  },
 | |
|  "nbformat": 4,
 | |
|  "nbformat_minor": 1
 | |
| }
 |