mirror of
https://github.com/gsi-upm/sitc
synced 2024-12-22 19:58:12 +00:00
203 lines
5.3 KiB
Plaintext
203 lines
5.3 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"![](files/images/EscUpmPolit_p.gif \"UPM\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Course Notes for Learning Intelligent Systems"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Department of Telematic Engineering Systems, Universidad Politécnica de Madrid, © Carlos A. Iglesias"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## [Introduction to Machine Learning](2_0_0_Intro_ML.ipynb)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Table of Contents\n",
|
||
"* [Model Persistence](#Model-Persistence)\n",
|
||
"* [References](#References)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Model Persistence"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The goal of this notebook is to learn how to save a model in the the scikit by using Python’s built-in persistence model, namely pickle\n",
|
||
"\n",
|
||
"First we recap the previous tasks: load data, preprocess and train the model."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# load iris\n",
|
||
"from sklearn import datasets\n",
|
||
"iris = datasets.load_iris()\n",
|
||
"\n",
|
||
"# Training and test spliting\n",
|
||
"from sklearn.model_selection import train_test_split\n",
|
||
"x_iris, y_iris = iris.data, iris.target\n",
|
||
"# Test set will be the 25% taken randomly\n",
|
||
"x_train, x_test, y_train, y_test = train_test_split(x_iris, y_iris, test_size=0.25, random_state=33)\n",
|
||
"\n",
|
||
"# Create the model using the pipeline\n",
|
||
"from sklearn.pipeline import Pipeline\n",
|
||
"from sklearn.preprocessing import StandardScaler\n",
|
||
"from sklearn.neighbors import KNeighborsClassifier\n",
|
||
"\n",
|
||
"# create a composite estimator made by a pipeline of preprocessing and the KNN model\n",
|
||
"model = Pipeline([\n",
|
||
" ('scaler', StandardScaler()),\n",
|
||
" ('KNN', KNeighborsClassifier())\n",
|
||
"])\n",
|
||
"\n",
|
||
"# Train the model\n",
|
||
"model.fit(x_train, y_train) \n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now we are going to save the model to a data structure called *pickle*. A pickle is a dictionary and can be used as a file or a string."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import pickle\n",
|
||
"s = pickle.dumps(model)\n",
|
||
"model2 = pickle.loads(s)\n",
|
||
"model2.predict(x_iris[0:1])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"A more efficient alternative to pickle is joblib, especially for big data problems. In this case the model can only be saved to a file and not to a string."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# save model\n",
|
||
"import joblib\n",
|
||
"joblib.dump(model, 'filename.pkl') \n",
|
||
"\n",
|
||
"#load model\n",
|
||
"model2 = joblib.load('filename.pkl') "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## References"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"* [Tutorial scikit-learn](http://scikit-learn.org/stable/tutorial/basic/tutorial.html)\n",
|
||
"* [Model persistence in scikit-learn](http://scikit-learn.org/stable/modules/model_persistence.html#model-persistence)\n",
|
||
"* [scikit-learn : Machine Learning Simplified](https://learning.oreilly.com/library/view/scikit-learn-machine/9781788833479/), Raúl Garreta; Guillermo Moncecchi, Packt Publishing, 2017.\n",
|
||
"* [Python Machine Learning](https://learning.oreilly.com/library/view/python-machine-learning/9781789955750/), Sebastian Raschka, Packt Publishing, 2019."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Licence\n",
|
||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||
"\n",
|
||
"© Carlos A. Iglesias, Universidad Politécnica de Madrid."
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"datacleaner": {
|
||
"position": {
|
||
"top": "50px"
|
||
},
|
||
"python": {
|
||
"varRefreshCmd": "try:\n print(_datacleaner.dataframe_metadata())\nexcept:\n print([])"
|
||
},
|
||
"window_display": false
|
||
},
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.8.12"
|
||
},
|
||
"latex_envs": {
|
||
"LaTeX_envs_menu_present": true,
|
||
"autocomplete": true,
|
||
"bibliofile": "biblio.bib",
|
||
"cite_by": "apalike",
|
||
"current_citInitial": 1,
|
||
"eqLabelWithNumbers": true,
|
||
"eqNumInitial": 1,
|
||
"hotkeys": {
|
||
"equation": "Ctrl-E",
|
||
"itemize": "Ctrl-I"
|
||
},
|
||
"labels_anchors": false,
|
||
"latex_user_defs": false,
|
||
"report_style_numbering": false,
|
||
"user_envs_cfg": false
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 1
|
||
}
|