mirror of
https://github.com/gsi-upm/sitc
synced 2026-04-30 14:44:36 +00:00
Compare commits
8 Commits
b83bcf5c2b
...
master
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
c361e23c8f | ||
|
|
7d473dcdf2 | ||
|
|
7562b18968 | ||
|
|
d1374320f0 | ||
|
|
1e8dbe70a3 | ||
|
|
b3c799e564 | ||
|
|
59badc1df2 | ||
|
|
77ed6c91be |
@@ -197,7 +197,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The features are simply the position of each point in the 2 dimension plane.\n",
|
"The features are simply the position of each point in the 2-dimensional plane.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In other words, a point $\\mathbf{x}$ is represented by its values $x_1$ and $x_2$:\n",
|
"In other words, a point $\\mathbf{x}$ is represented by its values $x_1$ and $x_2$:\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -208,14 +208,14 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Perform the classification task on several classifiers"
|
"## Perform the classification task on several classifiers."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Following, the classification on the spiral is done with several classifiers. We can see the performance on each class (each spiral), and their decision surfaces."
|
"Following the classification on the spiral is done with several classifiers. We can see the performance on each class (each spiral), and their decision surfaces."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -266,7 +266,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from sklearn.linear_model import LogisticRegression\n",
|
"from sklearn.linear_model import LogisticRegression\n",
|
||||||
"\n",
|
"\n",
|
||||||
"lr = LogisticRegression(n_jobs=-1)\n",
|
"lr = LogisticRegression()\n",
|
||||||
"lr.fit(X,y)\n",
|
"lr.fit(X,y)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"lr_preds = lr.predict(X_test)\n",
|
"lr_preds = lr.predict(X_test)\n",
|
||||||
@@ -275,8 +275,8 @@
|
|||||||
"print(classification_report(y_test, lr_preds))\n",
|
"print(classification_report(y_test, lr_preds))\n",
|
||||||
"\n",
|
"\n",
|
||||||
"plt.figure(figsize=(10,7))\n",
|
"plt.figure(figsize=(10,7))\n",
|
||||||
"# This methods outputs a visualization\n",
|
"# This method outputs a visualization\n",
|
||||||
"# the h parameter adjusts the precision of the visualization\n",
|
"# The h parameter adjusts the precision of the visualization\n",
|
||||||
"# if you find memory errors, set h to a higher value (e.g., h=0.1)\n",
|
"# if you find memory errors, set h to a higher value (e.g., h=0.1)\n",
|
||||||
"plot_decision_surface(X, y, lr, h=0.02) "
|
"plot_decision_surface(X, y, lr, h=0.02) "
|
||||||
]
|
]
|
||||||
@@ -535,11 +535,11 @@
|
|||||||
"collapsed": true
|
"collapsed": true
|
||||||
},
|
},
|
||||||
"source": [
|
"source": [
|
||||||
"We see that some classifiers (kNN, SVM) successfully learn the spiral problem. They can classify correctly in any part of the plane.\n",
|
"We see that some classifiers (kNN, SVM) successfully learn the spiral problem. They can classify correctly at any point in the plane.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Nevertheless, some classifiers (Logistic Regression, Gaussian Naive Bayes) are not able to learn the spiral pattern with their default configurations.\n",
|
"Nevertheless, some classifiers (Logistic Regression, Gaussian Naive Bayes) are not able to learn the spiral pattern with their default configurations.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In particular, the MLP performs very bad: it is not able to learn the spiral function. Nevertheless, it should be able to."
|
"In particular, the MLP performs very badly: it is not able to learn the spiral function. Nevertheless, it should be able to."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -578,7 +578,7 @@
|
|||||||
"- regularization of the network\n",
|
"- regularization of the network\n",
|
||||||
"- new features that are passed to the network\n",
|
"- new features that are passed to the network\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can search inspiration on [this playground](http://playground.tensorflow.org)."
|
"You can search for inspiration on [this playground](http://playground.tensorflow.org)."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -621,7 +621,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
"The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
|
||||||
"\n",
|
"\n",
|
||||||
"© Óscar Araque, Universidad Politécnica de Madrid."
|
"© Óscar Araque, Universidad Politécnica de Madrid."
|
||||||
]
|
]
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
@@ -239,7 +239,7 @@
|
|||||||
"name": "stderr",
|
"name": "stderr",
|
||||||
"output_type": "stream",
|
"output_type": "stream",
|
||||||
"text": [
|
"text": [
|
||||||
"/home/cif/anaconda3/lib/python3.10/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.\n",
|
"--",
|
||||||
" warnings.warn(msg, category=FutureWarning)\n"
|
" warnings.warn(msg, category=FutureWarning)\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -331,7 +331,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"vectorizer = CountVectorizer(analyzer=\"word\", stop_words='english', binary=True) \n",
|
"vectorizer = CountVectorizer(analyzer=\"word\", stop_words='english', binary=True) \n",
|
||||||
"vectors = vectorizer.fit_transform(documents)\n",
|
"vectors = vectorizer.fit_transform(documents)\n",
|
||||||
"vectorizer.get_feature_names()"
|
"vectorizer.get_feature_names_out()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -363,9 +363,9 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"vectorizer = CountVectorizer(analyzer=\"word\", stop_words='english', ngram_range=[2,2]) \n",
|
"vectorizer = CountVectorizer(analyzer=\"word\", stop_words='english', ngram_range=(2,2)) \n",
|
||||||
"vectors = vectorizer.fit_transform(documents)\n",
|
"vectors = vectorizer.fit_transform(documents)\n",
|
||||||
"vectorizer.get_feature_names()"
|
"vectorizer.get_feature_names_out()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -401,7 +401,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"vectorizer = TfidfVectorizer(analyzer=\"word\", stop_words='english')\n",
|
"vectorizer = TfidfVectorizer(analyzer=\"word\", stop_words='english')\n",
|
||||||
"vectors = vectorizer.fit_transform(documents)\n",
|
"vectors = vectorizer.fit_transform(documents)\n",
|
||||||
"vectorizer.get_feature_names()"
|
"vectorizer.get_feature_names_out()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -429,9 +429,9 @@
|
|||||||
"train = [doc1, doc2, doc3]\n",
|
"train = [doc1, doc2, doc3]\n",
|
||||||
"vectorizer = TfidfVectorizer(analyzer=\"word\", stop_words='english')\n",
|
"vectorizer = TfidfVectorizer(analyzer=\"word\", stop_words='english')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# We learn the vocabulary (fit) and tranform the docs into vectors\n",
|
"# We learn the vocabulary (fit) and transform the docs into vectors\n",
|
||||||
"vectors = vectorizer.fit_transform(train)\n",
|
"vectors = vectorizer.fit_transform(train)\n",
|
||||||
"vectorizer.get_feature_names()"
|
"vectorizer.get_feature_names_out()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -51,7 +51,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"In this session we provide a quick overview of the semantic models presented during the classes. In this case, we will use a real corpus so that we can extract meaningful patterns.\n",
|
"In this session, we provide a quick overview of the semantic models presented during the classes. In this case, we will use a real corpus so that we can extract meaningful patterns.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The main objectives of this session are:\n",
|
"The main objectives of this session are:\n",
|
||||||
"* Understand the models and their differences\n",
|
"* Understand the models and their differences\n",
|
||||||
@@ -69,9 +69,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We are going to use on of the corpus that come prepackaged with Scikit-learn: the [20 newsgroup datase](http://qwone.com/~jason/20Newsgroups/). The 20 newsgroup dataset contains 20k documents that belong to 20 topics.\n",
|
"We are going to use one of the corpora that come prepackaged with Scikit-learn: the [20 newsgroup dataset](http://qwone.com/~jason/20Newsgroups/). The 20 newsgroup dataset contains 20k documents that belong to 20 topics.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"We inspect now the corpus using the facilities from Scikit-learn, as explain in [scikit-learn](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html#newsgroups)"
|
"We inspect now the corpus using the facilities from Scikit-learn, as explained in [scikit-learn](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html#newsgroups)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -117,19 +117,19 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Converting Scikit-learn to gensim"
|
"# Converting Scikit-learn to gensim."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Although scikit-learn provides an LDA implementation, it is more popular the package *gensim*, which also provides an LSI implementation, as well as other functionalities. Fortunately, scikit-learn sparse matrices can be used in Gensim using the function *matutils.Sparse2Corpus()*. Anyway, if you are using intensively LDA,it can be convenient to create the corpus with their functions.\n",
|
"Although scikit-learn provides an LDA implementation, it is more popular than the package *gensim*, which also provides an LSI implementation, as well as other functionalities. Fortunately, scikit-learn sparse matrices can be used in Gensim using the function *matutils.Sparse2Corpus()*. Anyway, if you are using intensively LDA,it can be convenient to create the corpus with their functions.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You should install first:\n",
|
"You should install first:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"* *gensim*. Run 'conda install gensim' in a terminal.\n",
|
"* *gensim*. Run 'conda install gensim' in a terminal.\n",
|
||||||
"* *python-Levenshtein*. Run 'conda install python-Levenshtein' in a terminal"
|
"* *python-Levenshtein*. Run 'conda install python-Levenshtein' in a terminal."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -183,7 +183,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Although scikit-learn provides an LDA implementation, it is more popular the package *gensim*, which also provides an LSI implementation, as well as other functionalities. Fortunately, scikit-learn sparse matrices can be used in Gensim using the function *matutils.Sparse2Corpus()*."
|
"Although scikit-learn provides an LDA implementation, it is more popular than the package *gensim*, which also provides an LSI implementation, as well as other functionalities. Fortunately, scikit-learn sparse matrices can be used in Gensim using the function *matutils.Sparse2Corpus()*."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -51,7 +51,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Here we propose several exercises, it is recommended to work only in one of them."
|
"Here we propose several exercises; it is recommended to work only in one of them."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -65,8 +65,8 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can try the exercise Exercise 2: Sentiment Analysis on movie reviews of Scikit-Learn https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html. \n",
|
"You can try the exercise Exercise 2: Sentiment Analysis on movie reviews of Scikit-Learn https://scikit-learn.org/1.4/tutorial/text_analytics/working_with_text_data.html. \n",
|
||||||
"Previously you should follow the installation instructions in the section Tutorial Setup."
|
"Previously, you should follow the installation instructions in the section Tutorial Setup."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
Reference in New Issue
Block a user