1
0
mirror of https://github.com/gsi-upm/sitc synced 2026-02-08 23:58:17 +00:00

Update 2_6_1_Q-Learning_Visualization.ipynb

This commit is contained in:
Carlos A. Iglesias
2026-02-04 18:57:29 +01:00
committed by GitHub
parent 921eda4c9f
commit d062777922

View File

@@ -39,7 +39,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"In this section we are going to visualize Q-Learning based on this [link](https://gymnasium.farama.org/tutorials/training_agents/FrozenLake_tuto/#sphx-glr-tutorials-training-agents-frozenlake-tuto-py). The code has been ported to the last version of Gymnasium.\n", "In this section, we are going to visualize Q-Learning based on this [link](https://gymnasium.farama.org/tutorials/training_agents/FrozenLake_tuto/#sphx-glr-tutorials-training-agents-frozenlake-tuto-py). The code has been ported to the last version of Gymnasium.\n",
"\n", "\n",
"First, we are going to define a class *Params* for the Q-Learning parameters and the environment based on these values." "First, we are going to define a class *Params* for the Q-Learning parameters and the environment based on these values."
] ]
@@ -129,7 +129,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Running the environment" "## Running the environment."
] ]
}, },
{ {
@@ -161,7 +161,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"We want to plot the policy the agent has learned in the end. To do that the function *qtable_directions_map* perform these actions: 1. extract the best Q-values from the Q-table for each state, 2. get the corresponding best action for those Q-values, 3. map each action to an arrow so we can visualize it." "We want to plot the policy the agent has learned in the end. To do that, the function *qtable_directions_map* performs these actions: 1. extract the best Q-values from the Q-table for each state, 2. get the corresponding best action for those Q-values, 3. map each action to an arrow so we can visualize it."
] ]
}, },
{ {
@@ -182,7 +182,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Now well be running our agent on a few increasing maps sizes: \n", "Now well be running our agent on a few increasing map sizes: \n",
"- 4x4\n", "- 4x4\n",
"- 7x7\n", "- 7x7\n",
"- 9x9\n", "- 9x9\n",
@@ -312,7 +312,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The notebook is freely licensed under under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n", "The notebook is freely licensed under the [Creative Commons Attribution Share-Alike license](https://creativecommons.org/licenses/by/2.0/). \n",
"\n", "\n",
"© Carlos Á. Iglesias, Universidad Politécnica de Madrid." "© Carlos Á. Iglesias, Universidad Politécnica de Madrid."
] ]