Updated LLM

Updated LLM - compability problems with v5
Fix punctuation and update Scikit-Learn link
2026-04-30 14:44:36 +00:00 · 2026-04-21 14:46:37 +02:00 · 2026-04-21 14:45:18 +02:00 · 2026-04-16 18:42:12 +02:00 · 2026-04-16 16:34:36 +02:00 · 2026-04-16 16:27:23 +02:00
4 changed files with 265 additions and 188 deletions
--- a/nlp/0_1_LLM.ipynb
+++ b/nlp/0_1_LLM.ipynb
--- a/nlp/4_3_Vector_Representation.ipynb
+++ b/nlp/4_3_Vector_Representation.ipynb
@@ -239,7 +239,7 @@
     "name": "stderr",
     "output_type": "stream",
     "text": [
-      "/home/cif/anaconda3/lib/python3.10/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.\n",
+      "--",
      "  warnings.warn(msg, category=FutureWarning)\n"
     ]
    },
@@ -331,7 +331,7 @@
   "source": [
    "vectorizer = CountVectorizer(analyzer=\"word\", stop_words='english', binary=True) \n",
    "vectors = vectorizer.fit_transform(documents)\n",
-    "vectorizer.get_feature_names()"
+    "vectorizer.get_feature_names_out()"
   ]
  },
  {
@@ -363,9 +363,9 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "vectorizer = CountVectorizer(analyzer=\"word\", stop_words='english', ngram_range=[2,2]) \n",
+    "vectorizer = CountVectorizer(analyzer=\"word\", stop_words='english', ngram_range=(2,2)) \n",
    "vectors = vectorizer.fit_transform(documents)\n",
-    "vectorizer.get_feature_names()"
+    "vectorizer.get_feature_names_out()"
   ]
  },
  {
@@ -401,7 +401,7 @@
    "\n",
    "vectorizer = TfidfVectorizer(analyzer=\"word\", stop_words='english')\n",
    "vectors = vectorizer.fit_transform(documents)\n",
-    "vectorizer.get_feature_names()"
+    "vectorizer.get_feature_names_out()"
   ]
  },
  {
@@ -429,9 +429,9 @@
    "train = [doc1, doc2, doc3]\n",
    "vectorizer = TfidfVectorizer(analyzer=\"word\", stop_words='english')\n",
    "\n",
-    "# We learn the vocabulary (fit) and tranform the docs into vectors\n",
+    "# We learn the vocabulary (fit) and transform the docs into vectors\n",
    "vectors = vectorizer.fit_transform(train)\n",
-    "vectorizer.get_feature_names()"
+    "vectorizer.get_feature_names_out()"
   ]
  },
  {
--- a/nlp/4_5_Semantic_Models.ipynb
+++ b/nlp/4_5_Semantic_Models.ipynb
@@ -51,7 +51,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "In this session we provide a quick overview of the semantic models presented during the classes. In this case, we will use a real corpus so that we can extract meaningful patterns.\n",
+    "In this session, we provide a quick overview of the semantic models presented during the classes. In this case, we will use a real corpus so that we can extract meaningful patterns.\n",
    "\n",
    "The main objectives of this session are:\n",
    "* Understand the models and their differences\n",
@@ -69,9 +69,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We are going to use on of the corpus that come prepackaged with Scikit-learn: the [20 newsgroup datase](http://qwone.com/~jason/20Newsgroups/). The 20  newsgroup dataset contains 20k documents that belong to 20 topics.\n",
+    "We are going to use one of the corpora that come prepackaged with Scikit-learn: the [20 newsgroup dataset](http://qwone.com/~jason/20Newsgroups/). The 20  newsgroup dataset contains 20k documents that belong to 20 topics.\n",
    "\n",
-    "We inspect now the corpus using the facilities from Scikit-learn, as explain in [scikit-learn](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html#newsgroups)"
+    "We inspect now the corpus using the facilities from Scikit-learn, as explained in [scikit-learn](http://scikit-learn.org/stable/datasets/twenty_newsgroups.html#newsgroups)"
   ]
  },
  {
@@ -117,19 +117,19 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Converting Scikit-learn to gensim"
+    "# Converting Scikit-learn to gensim."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Although scikit-learn provides an LDA implementation, it is more popular the package *gensim*, which also provides an LSI implementation, as well as other functionalities. Fortunately, scikit-learn sparse matrices can be used in Gensim using the function *matutils.Sparse2Corpus()*. Anyway, if you are using intensively LDA,it can be convenient to create the corpus with their functions.\n",
+    "Although scikit-learn provides an LDA implementation, it is more popular than the package *gensim*, which also provides an LSI implementation, as well as other functionalities. Fortunately, scikit-learn sparse matrices can be used in Gensim using the function *matutils.Sparse2Corpus()*. Anyway, if you are using intensively LDA,it can be convenient to create the corpus with their functions.\n",
    "\n",
    "You should install first:\n",
    "\n",
    "* *gensim*. Run 'conda install gensim' in a terminal.\n",
-    "* *python-Levenshtein*. Run 'conda install python-Levenshtein' in a terminal"
+    "* *python-Levenshtein*. Run 'conda install python-Levenshtein' in a terminal."
   ]
  },
  {
@@ -183,7 +183,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Although scikit-learn provides an LDA implementation, it is more popular the package *gensim*, which also provides an LSI implementation, as well as other functionalities. Fortunately, scikit-learn sparse matrices can be used in Gensim using the function *matutils.Sparse2Corpus()*."
+    "Although scikit-learn provides an LDA implementation, it is more popular than the package *gensim*, which also provides an LSI implementation, as well as other functionalities. Fortunately, scikit-learn sparse matrices can be used in Gensim using the function *matutils.Sparse2Corpus()*."
   ]
  },
  {
--- a/nlp/4_7_Exercises.ipynb
+++ b/nlp/4_7_Exercises.ipynb
@@ -51,7 +51,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Here we propose several exercises, it is recommended to work only in one of them."
+    "Here we propose several exercises; it is recommended to work only in one of them."
   ]
  },
  {
@@ -65,8 +65,8 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "You can try the exercise Exercise 2: Sentiment Analysis on movie reviews of Scikit-Learn https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html. \n",
+    "You can try the exercise Exercise 2: Sentiment Analysis on movie reviews of Scikit-Learn https://scikit-learn.org/1.4/tutorial/text_analytics/working_with_text_data.html. \n",
-    "Previously you should follow the installation instructions in the section Tutorial Setup."
+    "Previously, you should follow the installation instructions in the section Tutorial Setup."
    ]
  },
  {
Author	SHA1	Message	Date
cif	c361e23c8f	Updated LLM	2026-04-21 14:46:37 +02:00
cif	7d473dcdf2	Updated LLM - compability problems with v5	2026-04-21 14:45:18 +02:00
Carlos A. Iglesias	7562b18968	Fix punctuation and update Scikit-Learn link	2026-04-16 18:42:12 +02:00
Carlos A. Iglesias	d1374320f0	Update 4_5_Semantic_Models.ipynb Minor typos.	2026-04-16 16:34:36 +02:00
Carlos A. Iglesias	1e8dbe70a3	Update 4_3_Vector_Representation.ipynb Updated ngram_range to tuple	2026-04-16 16:27:23 +02:00
Carlos A. Iglesias	b3c799e564	Update 4_3_Vector_Representation.ipynb Changed get_feature_names() with get_feature_names_out()	2026-04-16 16:24:45 +02:00