1
0
mirror of https://github.com/gsi-upm/sitc synced 2026-04-30 14:44:36 +00:00

Compare commits

8 Commits

Author SHA1 Message Date
cif
c361e23c8f Updated LLM 2026-04-21 14:46:37 +02:00
cif
7d473dcdf2 Updated LLM - compability problems with v5 2026-04-21 14:45:18 +02:00
Carlos A. Iglesias
7562b18968 Fix punctuation and update Scikit-Learn link 2026-04-16 18:42:12 +02:00
Carlos A. Iglesias
d1374320f0 Update 4_5_Semantic_Models.ipynb
Minor typos.
2026-04-16 16:34:36 +02:00
Carlos A. Iglesias
1e8dbe70a3 Update 4_3_Vector_Representation.ipynb
Updated ngram_range to tuple
2026-04-16 16:27:23 +02:00
Carlos A. Iglesias
b3c799e564 Update 4_3_Vector_Representation.ipynb
Changed get_feature_names() with get_feature_names_out()
2026-04-16 16:24:45 +02:00
Carlos A. Iglesias
59badc1df2 Fix markdown formatting in Exercise notebook 2026-04-09 11:57:03 +02:00
Carlos A. Iglesias
77ed6c91be Fix typos and improve clarity in markdown cells 2026-04-09 11:51:47 +02:00
5 changed files with 275 additions and 198 deletions

View File

@@ -197,7 +197,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The features are simply the position of each point in the 2 dimension plane.\n", "The features are simply the position of each point in the 2-dimensional plane.\n",
"\n", "\n",
"In other words, a point $\\mathbf{x}$ is represented by its values $x_1$ and $x_2$:\n", "In other words, a point $\\mathbf{x}$ is represented by its values $x_1$ and $x_2$:\n",
"\n", "\n",
@@ -208,14 +208,14 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Perform the classification task on several classifiers" "## Perform the classification task on several classifiers."
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Following, the classification on the spiral is done with several classifiers. We can see the performance on each class (each spiral), and their decision surfaces." "Following the classification on the spiral is done with several classifiers. We can see the performance on each class (each spiral), and their decision surfaces."
] ]
}, },
{ {
@@ -266,7 +266,7 @@
"source": [ "source": [
"from sklearn.linear_model import LogisticRegression\n", "from sklearn.linear_model import LogisticRegression\n",
"\n", "\n",
"lr = LogisticRegression(n_jobs=-1)\n", "lr = LogisticRegression()\n",
"lr.fit(X,y)\n", "lr.fit(X,y)\n",
"\n", "\n",
"lr_preds = lr.predict(X_test)\n", "lr_preds = lr.predict(X_test)\n",
@@ -275,8 +275,8 @@
"print(classification_report(y_test, lr_preds))\n", "print(classification_report(y_test, lr_preds))\n",
"\n", "\n",
"plt.figure(figsize=(10,7))\n", "plt.figure(figsize=(10,7))\n",
"# This methods outputs a visualization\n", "# This method outputs a visualization\n",
"# the h parameter adjusts the precision of the visualization\n", "# The h parameter adjusts the precision of the visualization\n",
"# if you find memory errors, set h to a higher value (e.g., h=0.1)\n", "# if you find memory errors, set h to a higher value (e.g., h=0.1)\n",
"plot_decision_surface(X, y, lr, h=0.02) " "plot_decision_surface(X, y, lr, h=0.02) "
] ]
@@ -535,11 +535,11 @@
"collapsed": true "collapsed": true
}, },
"source": [ "source": [
"We see that some classifiers (kNN, SVM) successfully learn the spiral problem. They can classify correctly in any part of the plane.\n", "We see that some classifiers (kNN, SVM) successfully learn the spiral problem. They can classify correctly at any point in the plane.\n",
"\n", "\n",
"Nevertheless, some classifiers (Logistic Regression, Gaussian Naive Bayes) are not able to learn the spiral pattern with their default configurations.\n", "Nevertheless, some classifiers (Logistic Regression, Gaussian Naive Bayes) are not able to learn the spiral pattern with their default configurations.\n",
"\n", "\n",
"In particular, the MLP performs very bad: it is not able to learn the spiral function. Nevertheless, it should be able to." "In particular, the MLP performs very badly: it is not able to learn the spiral function. Nevertheless, it should be able to."
] ]
}, },
{ {
@@ -578,7 +578,7 @@
"- regularization of the network\n", "- regularization of the network\n",
"- new features that are passed to the network\n", "- new features that are passed to the network\n",
"\n", "\n",
"You can search inspiration on [this playground](http://playground.tensorflow.org)." "You can search for inspiration on [this playground](http://playground.tensorflow.org)."
] ]
}, },
{ {
@@ -621,7 +621,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n", "The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n", "\n",
"© Óscar Araque, Universidad Politécnica de Madrid." "© Óscar Araque, Universidad Politécnica de Madrid."
] ]

File diff suppressed because one or more lines are too long

View File

@@ -239,7 +239,7 @@
"name": "stderr", "name": "stderr",
"output_type": "stream", "output_type": "stream",
"text": [ "text": [
"/home/cif/anaconda3/lib/python3.10/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.\n", "--",
" warnings.warn(msg, category=FutureWarning)\n" " warnings.warn(msg, category=FutureWarning)\n"
] ]
}, },
@@ -331,7 +331,7 @@
"source": [ "source": [
"vectorizer = CountVectorizer(analyzer=\"word\", stop_words='english', binary=True) \n", "vectorizer = CountVectorizer(analyzer=\"word\", stop_words='english', binary=True) \n",
"vectors = vectorizer.fit_transform(documents)\n", "vectors = vectorizer.fit_transform(documents)\n",
"vectorizer.get_feature_names()" "vectorizer.get_feature_names_out()"
] ]
}, },
{ {
@@ -363,9 +363,9 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"vectorizer = CountVectorizer(analyzer=\"word\", stop_words='english', ngram_range=[2,2]) \n", "vectorizer = CountVectorizer(analyzer=\"word\", stop_words='english', ngram_range=(2,2)) \n",
"vectors = vectorizer.fit_transform(documents)\n", "vectors = vectorizer.fit_transform(documents)\n",
"vectorizer.get_feature_names()" "vectorizer.get_feature_names_out()"
] ]
}, },
{ {
@@ -401,7 +401,7 @@
"\n", "\n",
"vectorizer = TfidfVectorizer(analyzer=\"word\", stop_words='english')\n", "vectorizer = TfidfVectorizer(analyzer=\"word\", stop_words='english')\n",
"vectors = vectorizer.fit_transform(documents)\n", "vectors = vectorizer.fit_transform(documents)\n",
"vectorizer.get_feature_names()" "vectorizer.get_feature_names_out()"
] ]
}, },
{ {
@@ -429,9 +429,9 @@
"train = [doc1, doc2, doc3]\n", "train = [doc1, doc2, doc3]\n",
"vectorizer = TfidfVectorizer(analyzer=\"word\", stop_words='english')\n", "vectorizer = TfidfVectorizer(analyzer=\"word\", stop_words='english')\n",
"\n", "\n",
"# We learn the vocabulary (fit) and tranform the docs into vectors\n", "# We learn the vocabulary (fit) and transform the docs into vectors\n",
"vectors = vectorizer.fit_transform(train)\n", "vectors = vectorizer.fit_transform(train)\n",
"vectorizer.get_feature_names()" "vectorizer.get_feature_names_out()"
] ]
}, },
{ {

View File

@@ -51,7 +51,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"In this session we provide a quick overview of the semantic models presented during the classes. In this case, we will use a real corpus so that we can extract meaningful patterns.\n", "In this session, we provide a quick overview of the semantic models presented during the classes. In this case, we will use a real corpus so that we can extract meaningful patterns.\n",
"\n", "\n",
"The main objectives of this session are:\n", "The main objectives of this session are:\n",
"* Understand the models and their differences\n", "* Understand the models and their differences\n",
@@ -69,9 +69,9 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"We are going to use on of the corpus that come prepackaged with Scikit-learn: the [20 newsgroup datase](http://qwone.com/~jason/20Newsgroups/). The 20 newsgroup dataset contains 20k documents that belong to 20 topics.\n", "We are going to use one of the corpora that come prepackaged with Scikit-learn: the [20 newsgroup dataset](http://qwone.com/~jason/20Newsgroups/). The 20 newsgroup dataset contains 20k documents that belong to 20 topics.\n",
"\n", "\n",
"We inspect now the corpus using the facilities from Scikit-learn, as explain in [scikit-learn](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html#newsgroups)" "We inspect now the corpus using the facilities from Scikit-learn, as explained in [scikit-learn](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html#newsgroups)"
] ]
}, },
{ {
@@ -117,19 +117,19 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Converting Scikit-learn to gensim" "# Converting Scikit-learn to gensim."
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Although scikit-learn provides an LDA implementation, it is more popular the package *gensim*, which also provides an LSI implementation, as well as other functionalities. Fortunately, scikit-learn sparse matrices can be used in Gensim using the function *matutils.Sparse2Corpus()*. Anyway, if you are using intensively LDA,it can be convenient to create the corpus with their functions.\n", "Although scikit-learn provides an LDA implementation, it is more popular than the package *gensim*, which also provides an LSI implementation, as well as other functionalities. Fortunately, scikit-learn sparse matrices can be used in Gensim using the function *matutils.Sparse2Corpus()*. Anyway, if you are using intensively LDA,it can be convenient to create the corpus with their functions.\n",
"\n", "\n",
"You should install first:\n", "You should install first:\n",
"\n", "\n",
"* *gensim*. Run 'conda install gensim' in a terminal.\n", "* *gensim*. Run 'conda install gensim' in a terminal.\n",
"* *python-Levenshtein*. Run 'conda install python-Levenshtein' in a terminal" "* *python-Levenshtein*. Run 'conda install python-Levenshtein' in a terminal."
] ]
}, },
{ {
@@ -183,7 +183,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Although scikit-learn provides an LDA implementation, it is more popular the package *gensim*, which also provides an LSI implementation, as well as other functionalities. Fortunately, scikit-learn sparse matrices can be used in Gensim using the function *matutils.Sparse2Corpus()*." "Although scikit-learn provides an LDA implementation, it is more popular than the package *gensim*, which also provides an LSI implementation, as well as other functionalities. Fortunately, scikit-learn sparse matrices can be used in Gensim using the function *matutils.Sparse2Corpus()*."
] ]
}, },
{ {

View File

@@ -51,7 +51,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Here we propose several exercises, it is recommended to work only in one of them." "Here we propose several exercises; it is recommended to work only in one of them."
] ]
}, },
{ {
@@ -65,8 +65,8 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"You can try the exercise Exercise 2: Sentiment Analysis on movie reviews of Scikit-Learn https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html. \n", "You can try the exercise Exercise 2: Sentiment Analysis on movie reviews of Scikit-Learn https://scikit-learn.org/1.4/tutorial/text_analytics/working_with_text_data.html. \n",
"Previously you should follow the installation instructions in the section Tutorial Setup." "Previously, you should follow the installation instructions in the section Tutorial Setup."
] ]
}, },
{ {