You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
sitc/ml1/2_7_Model_Persistence.ipynb

192 lines
4.7 KiB
Plaintext

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"* [Model Persistence](#Model-Persistence)\n",
"* [References](#References)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model Persistence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The goal of this notebook is to learn how to save a model in the the scikit by using Pythons built-in persistence model, namely pickle\n",
"\n",
"First we recap the previous tasks: load data, preprocess and train the model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# load iris\n",
"from sklearn import datasets\n",
"iris = datasets.load_iris()\n",
"\n",
"# Training and test spliting\n",
"from sklearn.model_selection import train_test_split\n",
"x_iris, y_iris = iris.data, iris.target\n",
"# Test set will be the 25% taken randomly\n",
"x_train, x_test, y_train, y_test = train_test_split(x_iris, y_iris, test_size=0.25, random_state=33)\n",
"\n",
"# Create the model using the pipeline\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"\n",
"# create a composite estimator made by a pipeline of preprocessing and the KNN model\n",
"model = Pipeline([\n",
" ('scaler', StandardScaler()),\n",
" ('KNN', KNeighborsClassifier())\n",
"])\n",
"\n",
"# Train the model\n",
"model.fit(x_train, y_train) \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we are going to save the model to a data structure called *pickle*. A pickle is a dictionary and can be used as a file or a string."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pickle\n",
"s = pickle.dumps(model)\n",
"model2 = pickle.loads(s)\n",
"model2.predict(x_iris[0:1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A more efficient alternative to pickle is joblib, especially for big data problems. In this case the model can only be saved to a file and not to a string."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# save model\n",
"from sklearn.externals import joblib\n",
"joblib.dump(model, 'filename.pkl') \n",
"\n",
"#load model\n",
"model2 = joblib.load('filename.pkl') "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* [Tutorial scikit-learn](http://scikit-learn.org/stable/tutorial/basic/tutorial.html)\n",
"* [Model persistence in scikit-learn](http://scikit-learn.org/stable/modules/model_persistence.html#model-persistence)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}