1
0
mirror of https://github.com/gsi-upm/sitc synced 2024-11-05 07:31:41 +00:00
sitc/ml1/2_7_Model_Persistence.ipynb

201 lines
5.0 KiB
Plaintext
Raw Normal View History

2016-03-15 12:55:14 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Course Notes for Learning Intelligent Systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © 2016 Carlos A. Iglesias"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Table of Contents\n",
"* [Model Persistence](#Model-Persistence)\n",
"* [References](#References)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model Persistence"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The goal of this notebook is to learn how to save a model in the the scikit by using Pythons built-in persistence model, namely pickle\n",
"\n",
"First we recap the previous tasks: load data, preprocess and train the model."
]
},
{
"cell_type": "code",
2016-03-28 10:26:20 +00:00
"execution_count": 1,
"metadata": {},
2016-03-15 12:55:14 +00:00
"outputs": [
{
"data": {
"text/plain": [
"Pipeline(steps=[('scaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('KNN', KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
" metric_params=None, n_jobs=1, n_neighbors=5, p=2,\n",
" weights='uniform'))])"
]
},
2016-03-28 10:26:20 +00:00
"execution_count": 1,
2016-03-15 12:55:14 +00:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# load iris\n",
"from sklearn import datasets\n",
"iris = datasets.load_iris()\n",
"\n",
"# Training and test spliting\n",
"from sklearn.model_selection import train_test_split\n",
2016-03-15 12:55:14 +00:00
"x_iris, y_iris = iris.data, iris.target\n",
"# Test set will be the 25% taken randomly\n",
"x_train, x_test, y_train, y_test = train_test_split(x_iris, y_iris, test_size=0.25, random_state=33)\n",
"\n",
"# Create the model using the pipeline\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"\n",
"# create a composite estimator made by a pipeline of preprocessing and the KNN model\n",
"model = Pipeline([\n",
" ('scaler', StandardScaler()),\n",
" ('KNN', KNeighborsClassifier())\n",
"])\n",
"\n",
"# Train the model\n",
"model.fit(x_train, y_train) \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we are going to save the model to a data structure called *pickle*. A pickle is a dictionary and can be used as a file or a string."
]
},
{
"cell_type": "code",
2016-03-28 10:26:20 +00:00
"execution_count": 2,
"metadata": {},
2016-03-15 12:55:14 +00:00
"outputs": [
{
"data": {
"text/plain": [
"array([0])"
]
},
2016-03-28 10:26:20 +00:00
"execution_count": 2,
2016-03-15 12:55:14 +00:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pickle\n",
"s = pickle.dumps(model)\n",
"model2 = pickle.loads(s)\n",
"model2.predict(x_iris[0:1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A more efficient alternative to pickle is joblib, especially for big data problems. In this case the model can only be saved to a file and not to a string."
]
},
{
"cell_type": "code",
2016-03-28 10:26:20 +00:00
"execution_count": 3,
2016-03-15 12:55:14 +00:00
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# save model\n",
"from sklearn.externals import joblib\n",
"joblib.dump(model, 'filename.pkl') \n",
"\n",
"#load model\n",
"model2 = joblib.load('filename.pkl') "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* [Tutorial scikit-learn](http://scikit-learn.org/stable/tutorial/basic/tutorial.html)\n",
"* [Model persistence in scikit-learn](http://scikit-learn.org/stable/modules/model_persistence.html#model-persistence)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Licence\n",
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n",
"© 2016 Carlos A. Iglesias, Universidad Politécnica de Madrid."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
2016-03-15 12:55:14 +00:00
}
},
"nbformat": 4,
"nbformat_minor": 1
2016-03-15 12:55:14 +00:00
}