mirror of
https://github.com/gsi-upm/senpy
synced 2024-11-24 17:12:29 +00:00
593 lines
30 KiB
Plaintext
593 lines
30 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Evaluating Services"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Sentiment analysis plugins can also be evaluated on a series of pre-defined datasets.\n",
|
|
"This can be done in three ways: through the Web UI (playground), through the web API and programmatically.\n",
|
|
"\n",
|
|
"Regardless of the way you perform the evaluation, you will need to specify a plugin (service) that you want to evaluate, and a series of datasets on which it should be evaluated.\n",
|
|
"\n",
|
|
"to evaluate a plugin on a dataset, senpy use the plugin to predict the sentiment in each entry in the dataset.\n",
|
|
"These predictions are compared with the expected values to produce several metrics, such as: accuracy, precision and f1-score.\n",
|
|
"\n",
|
|
"**note**: the evaluation process might take long for plugins that use external services, such as `sentiment140`.\n",
|
|
"\n",
|
|
"**note**: plugins are assumed to be pre-trained and invariant. i.e., the prediction for an entry should "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Web UI (Playground)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The playground should contain a tab for Evaluation, where you can select any plugin that can be evaluated, and the set of datasets that you want to test the plugin on.\n",
|
|
"\n",
|
|
"For example, the image below shows the results of the `sentiment-vader` plugin on the `vader` and `sts` datasets:\n",
|
|
"\n",
|
|
"\n",
|
|
"![](eval_table.png)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Web API"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The api exposes an endpoint (`/evaluate`), which accents the plugin and the set of datasets on which it should be evaluated."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The following code is not necessary, but it will display the results better:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Here is a simple call using the requests library:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<style>.output_html .hll { background-color: #ffffcc }\n",
|
|
".output_html { background: #f8f8f8; }\n",
|
|
".output_html .c { color: #408080; font-style: italic } /* Comment */\n",
|
|
".output_html .err { border: 1px solid #FF0000 } /* Error */\n",
|
|
".output_html .k { color: #008000; font-weight: bold } /* Keyword */\n",
|
|
".output_html .o { color: #666666 } /* Operator */\n",
|
|
".output_html .ch { color: #408080; font-style: italic } /* Comment.Hashbang */\n",
|
|
".output_html .cm { color: #408080; font-style: italic } /* Comment.Multiline */\n",
|
|
".output_html .cp { color: #BC7A00 } /* Comment.Preproc */\n",
|
|
".output_html .cpf { color: #408080; font-style: italic } /* Comment.PreprocFile */\n",
|
|
".output_html .c1 { color: #408080; font-style: italic } /* Comment.Single */\n",
|
|
".output_html .cs { color: #408080; font-style: italic } /* Comment.Special */\n",
|
|
".output_html .gd { color: #A00000 } /* Generic.Deleted */\n",
|
|
".output_html .ge { font-style: italic } /* Generic.Emph */\n",
|
|
".output_html .gr { color: #FF0000 } /* Generic.Error */\n",
|
|
".output_html .gh { color: #000080; font-weight: bold } /* Generic.Heading */\n",
|
|
".output_html .gi { color: #00A000 } /* Generic.Inserted */\n",
|
|
".output_html .go { color: #888888 } /* Generic.Output */\n",
|
|
".output_html .gp { color: #000080; font-weight: bold } /* Generic.Prompt */\n",
|
|
".output_html .gs { font-weight: bold } /* Generic.Strong */\n",
|
|
".output_html .gu { color: #800080; font-weight: bold } /* Generic.Subheading */\n",
|
|
".output_html .gt { color: #0044DD } /* Generic.Traceback */\n",
|
|
".output_html .kc { color: #008000; font-weight: bold } /* Keyword.Constant */\n",
|
|
".output_html .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */\n",
|
|
".output_html .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */\n",
|
|
".output_html .kp { color: #008000 } /* Keyword.Pseudo */\n",
|
|
".output_html .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */\n",
|
|
".output_html .kt { color: #B00040 } /* Keyword.Type */\n",
|
|
".output_html .m { color: #666666 } /* Literal.Number */\n",
|
|
".output_html .s { color: #BA2121 } /* Literal.String */\n",
|
|
".output_html .na { color: #7D9029 } /* Name.Attribute */\n",
|
|
".output_html .nb { color: #008000 } /* Name.Builtin */\n",
|
|
".output_html .nc { color: #0000FF; font-weight: bold } /* Name.Class */\n",
|
|
".output_html .no { color: #880000 } /* Name.Constant */\n",
|
|
".output_html .nd { color: #AA22FF } /* Name.Decorator */\n",
|
|
".output_html .ni { color: #999999; font-weight: bold } /* Name.Entity */\n",
|
|
".output_html .ne { color: #D2413A; font-weight: bold } /* Name.Exception */\n",
|
|
".output_html .nf { color: #0000FF } /* Name.Function */\n",
|
|
".output_html .nl { color: #A0A000 } /* Name.Label */\n",
|
|
".output_html .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */\n",
|
|
".output_html .nt { color: #008000; font-weight: bold } /* Name.Tag */\n",
|
|
".output_html .nv { color: #19177C } /* Name.Variable */\n",
|
|
".output_html .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */\n",
|
|
".output_html .w { color: #bbbbbb } /* Text.Whitespace */\n",
|
|
".output_html .mb { color: #666666 } /* Literal.Number.Bin */\n",
|
|
".output_html .mf { color: #666666 } /* Literal.Number.Float */\n",
|
|
".output_html .mh { color: #666666 } /* Literal.Number.Hex */\n",
|
|
".output_html .mi { color: #666666 } /* Literal.Number.Integer */\n",
|
|
".output_html .mo { color: #666666 } /* Literal.Number.Oct */\n",
|
|
".output_html .sa { color: #BA2121 } /* Literal.String.Affix */\n",
|
|
".output_html .sb { color: #BA2121 } /* Literal.String.Backtick */\n",
|
|
".output_html .sc { color: #BA2121 } /* Literal.String.Char */\n",
|
|
".output_html .dl { color: #BA2121 } /* Literal.String.Delimiter */\n",
|
|
".output_html .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */\n",
|
|
".output_html .s2 { color: #BA2121 } /* Literal.String.Double */\n",
|
|
".output_html .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */\n",
|
|
".output_html .sh { color: #BA2121 } /* Literal.String.Heredoc */\n",
|
|
".output_html .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */\n",
|
|
".output_html .sx { color: #008000 } /* Literal.String.Other */\n",
|
|
".output_html .sr { color: #BB6688 } /* Literal.String.Regex */\n",
|
|
".output_html .s1 { color: #BA2121 } /* Literal.String.Single */\n",
|
|
".output_html .ss { color: #19177C } /* Literal.String.Symbol */\n",
|
|
".output_html .bp { color: #008000 } /* Name.Builtin.Pseudo */\n",
|
|
".output_html .fm { color: #0000FF } /* Name.Function.Magic */\n",
|
|
".output_html .vc { color: #19177C } /* Name.Variable.Class */\n",
|
|
".output_html .vg { color: #19177C } /* Name.Variable.Global */\n",
|
|
".output_html .vi { color: #19177C } /* Name.Variable.Instance */\n",
|
|
".output_html .vm { color: #19177C } /* Name.Variable.Magic */\n",
|
|
".output_html .il { color: #666666 } /* Literal.Number.Integer.Long */</style><div class=\"highlight\"><pre><span></span><span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@context"</span><span class=\"p\">:</span> <span class=\"s2\">"http://senpy.gsi.upm.es/api/contexts/YXBpL2V2YWx1YXRlLz9hbGdvPXNlbnRpbWVudC12YWRlciZkYXRhc2V0PXZhZGVyJTJDc3RzJm91dGZvcm1hdD1qc29uLWxkIw%3D%3D"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"AggregatedEvaluation"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"senpy:evaluations"</span><span class=\"p\">:</span> <span class=\"p\">[</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"Evaluation"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"evaluates"</span><span class=\"p\">:</span> <span class=\"s2\">"endpoint:plugins/sentiment-vader_0.1.1__vader"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"evaluatesOn"</span><span class=\"p\">:</span> <span class=\"s2\">"vader"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"metrics"</span><span class=\"p\">:</span> <span class=\"p\">[</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"Accuracy"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"value"</span><span class=\"p\">:</span> <span class=\"mf\">0.6907142857142857</span>\n",
|
|
" <span class=\"p\">},</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"Precision_macro"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"value"</span><span class=\"p\">:</span> <span class=\"mf\">0.34535714285714286</span>\n",
|
|
" <span class=\"p\">},</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"Recall_macro"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"value"</span><span class=\"p\">:</span> <span class=\"mf\">0.5</span>\n",
|
|
" <span class=\"p\">},</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"F1_macro"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"value"</span><span class=\"p\">:</span> <span class=\"mf\">0.40853400929446554</span>\n",
|
|
" <span class=\"p\">},</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"F1_weighted"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"value"</span><span class=\"p\">:</span> <span class=\"mf\">0.5643605528396403</span>\n",
|
|
" <span class=\"p\">},</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"F1_micro"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"value"</span><span class=\"p\">:</span> <span class=\"mf\">0.6907142857142857</span>\n",
|
|
" <span class=\"p\">},</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"F1_macro"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"value"</span><span class=\"p\">:</span> <span class=\"mf\">0.40853400929446554</span>\n",
|
|
" <span class=\"p\">}</span>\n",
|
|
" <span class=\"p\">]</span>\n",
|
|
" <span class=\"p\">},</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"Evaluation"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"evaluates"</span><span class=\"p\">:</span> <span class=\"s2\">"endpoint:plugins/sentiment-vader_0.1.1__sts"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"evaluatesOn"</span><span class=\"p\">:</span> <span class=\"s2\">"sts"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"metrics"</span><span class=\"p\">:</span> <span class=\"p\">[</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"Accuracy"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"value"</span><span class=\"p\">:</span> <span class=\"mf\">0.3107177974434612</span>\n",
|
|
" <span class=\"p\">},</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"Precision_macro"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"value"</span><span class=\"p\">:</span> <span class=\"mf\">0.1553588987217306</span>\n",
|
|
" <span class=\"p\">},</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"Recall_macro"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"value"</span><span class=\"p\">:</span> <span class=\"mf\">0.5</span>\n",
|
|
" <span class=\"p\">},</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"F1_macro"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"value"</span><span class=\"p\">:</span> <span class=\"mf\">0.23705926481620407</span>\n",
|
|
" <span class=\"p\">},</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"F1_weighted"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"value"</span><span class=\"p\">:</span> <span class=\"mf\">0.14731706525451424</span>\n",
|
|
" <span class=\"p\">},</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"F1_micro"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"value"</span><span class=\"p\">:</span> <span class=\"mf\">0.3107177974434612</span>\n",
|
|
" <span class=\"p\">},</span>\n",
|
|
" <span class=\"p\">{</span>\n",
|
|
" <span class=\"nt\">"@type"</span><span class=\"p\">:</span> <span class=\"s2\">"F1_macro"</span><span class=\"p\">,</span>\n",
|
|
" <span class=\"nt\">"value"</span><span class=\"p\">:</span> <span class=\"mf\">0.23705926481620407</span>\n",
|
|
" <span class=\"p\">}</span>\n",
|
|
" <span class=\"p\">]</span>\n",
|
|
" <span class=\"p\">}</span>\n",
|
|
" <span class=\"p\">]</span>\n",
|
|
"<span class=\"p\">}</span>\n",
|
|
"</pre></div>\n"
|
|
],
|
|
"text/latex": [
|
|
"\\begin{Verbatim}[commandchars=\\\\\\{\\}]\n",
|
|
"\\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@context\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}http://senpy.gsi.upm.es/api/contexts/YXBpL2V2YWx1YXRlLz9hbGdvPXNlbnRpbWVudC12YWRlciZkYXRhc2V0PXZhZGVyJTJDc3RzJm91dGZvcm1hdD1qc29uLWxkIw\\PYZpc{}3D\\PYZpc{}3D\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}AggregatedEvaluation\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}senpy:evaluations\\PYZdq{}}\\PY{p}{:} \\PY{p}{[}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}Evaluation\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}evaluates\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}endpoint:plugins/sentiment\\PYZhy{}vader\\PYZus{}0.1.1\\PYZus{}\\PYZus{}vader\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}evaluatesOn\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}vader\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}metrics\\PYZdq{}}\\PY{p}{:} \\PY{p}{[}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}Accuracy\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}value\\PYZdq{}}\\PY{p}{:} \\PY{l+m+mf}{0.6907142857142857}\n",
|
|
" \\PY{p}{\\PYZcb{}}\\PY{p}{,}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}Precision\\PYZus{}macro\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}value\\PYZdq{}}\\PY{p}{:} \\PY{l+m+mf}{0.34535714285714286}\n",
|
|
" \\PY{p}{\\PYZcb{}}\\PY{p}{,}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}Recall\\PYZus{}macro\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}value\\PYZdq{}}\\PY{p}{:} \\PY{l+m+mf}{0.5}\n",
|
|
" \\PY{p}{\\PYZcb{}}\\PY{p}{,}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}F1\\PYZus{}macro\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}value\\PYZdq{}}\\PY{p}{:} \\PY{l+m+mf}{0.40853400929446554}\n",
|
|
" \\PY{p}{\\PYZcb{}}\\PY{p}{,}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}F1\\PYZus{}weighted\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}value\\PYZdq{}}\\PY{p}{:} \\PY{l+m+mf}{0.5643605528396403}\n",
|
|
" \\PY{p}{\\PYZcb{}}\\PY{p}{,}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}F1\\PYZus{}micro\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}value\\PYZdq{}}\\PY{p}{:} \\PY{l+m+mf}{0.6907142857142857}\n",
|
|
" \\PY{p}{\\PYZcb{}}\\PY{p}{,}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}F1\\PYZus{}macro\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}value\\PYZdq{}}\\PY{p}{:} \\PY{l+m+mf}{0.40853400929446554}\n",
|
|
" \\PY{p}{\\PYZcb{}}\n",
|
|
" \\PY{p}{]}\n",
|
|
" \\PY{p}{\\PYZcb{}}\\PY{p}{,}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}Evaluation\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}evaluates\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}endpoint:plugins/sentiment\\PYZhy{}vader\\PYZus{}0.1.1\\PYZus{}\\PYZus{}sts\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}evaluatesOn\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}sts\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}metrics\\PYZdq{}}\\PY{p}{:} \\PY{p}{[}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}Accuracy\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}value\\PYZdq{}}\\PY{p}{:} \\PY{l+m+mf}{0.3107177974434612}\n",
|
|
" \\PY{p}{\\PYZcb{}}\\PY{p}{,}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}Precision\\PYZus{}macro\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}value\\PYZdq{}}\\PY{p}{:} \\PY{l+m+mf}{0.1553588987217306}\n",
|
|
" \\PY{p}{\\PYZcb{}}\\PY{p}{,}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}Recall\\PYZus{}macro\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}value\\PYZdq{}}\\PY{p}{:} \\PY{l+m+mf}{0.5}\n",
|
|
" \\PY{p}{\\PYZcb{}}\\PY{p}{,}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}F1\\PYZus{}macro\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}value\\PYZdq{}}\\PY{p}{:} \\PY{l+m+mf}{0.23705926481620407}\n",
|
|
" \\PY{p}{\\PYZcb{}}\\PY{p}{,}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}F1\\PYZus{}weighted\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}value\\PYZdq{}}\\PY{p}{:} \\PY{l+m+mf}{0.14731706525451424}\n",
|
|
" \\PY{p}{\\PYZcb{}}\\PY{p}{,}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}F1\\PYZus{}micro\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}value\\PYZdq{}}\\PY{p}{:} \\PY{l+m+mf}{0.3107177974434612}\n",
|
|
" \\PY{p}{\\PYZcb{}}\\PY{p}{,}\n",
|
|
" \\PY{p}{\\PYZob{}}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}@type\\PYZdq{}}\\PY{p}{:} \\PY{l+s+s2}{\\PYZdq{}F1\\PYZus{}macro\\PYZdq{}}\\PY{p}{,}\n",
|
|
" \\PY{n+nt}{\\PYZdq{}value\\PYZdq{}}\\PY{p}{:} \\PY{l+m+mf}{0.23705926481620407}\n",
|
|
" \\PY{p}{\\PYZcb{}}\n",
|
|
" \\PY{p}{]}\n",
|
|
" \\PY{p}{\\PYZcb{}}\n",
|
|
" \\PY{p}{]}\n",
|
|
"\\PY{p}{\\PYZcb{}}\n",
|
|
"\\end{Verbatim}\n"
|
|
],
|
|
"text/plain": [
|
|
"{\n",
|
|
" \"@context\": \"http://senpy.gsi.upm.es/api/contexts/YXBpL2V2YWx1YXRlLz9hbGdvPXNlbnRpbWVudC12YWRlciZkYXRhc2V0PXZhZGVyJTJDc3RzJm91dGZvcm1hdD1qc29uLWxkIw%3D%3D\",\n",
|
|
" \"@type\": \"AggregatedEvaluation\",\n",
|
|
" \"senpy:evaluations\": [\n",
|
|
" {\n",
|
|
" \"@type\": \"Evaluation\",\n",
|
|
" \"evaluates\": \"endpoint:plugins/sentiment-vader_0.1.1__vader\",\n",
|
|
" \"evaluatesOn\": \"vader\",\n",
|
|
" \"metrics\": [\n",
|
|
" {\n",
|
|
" \"@type\": \"Accuracy\",\n",
|
|
" \"value\": 0.6907142857142857\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"Precision_macro\",\n",
|
|
" \"value\": 0.34535714285714286\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"Recall_macro\",\n",
|
|
" \"value\": 0.5\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"F1_macro\",\n",
|
|
" \"value\": 0.40853400929446554\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"F1_weighted\",\n",
|
|
" \"value\": 0.5643605528396403\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"F1_micro\",\n",
|
|
" \"value\": 0.6907142857142857\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"F1_macro\",\n",
|
|
" \"value\": 0.40853400929446554\n",
|
|
" }\n",
|
|
" ]\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"Evaluation\",\n",
|
|
" \"evaluates\": \"endpoint:plugins/sentiment-vader_0.1.1__sts\",\n",
|
|
" \"evaluatesOn\": \"sts\",\n",
|
|
" \"metrics\": [\n",
|
|
" {\n",
|
|
" \"@type\": \"Accuracy\",\n",
|
|
" \"value\": 0.3107177974434612\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"Precision_macro\",\n",
|
|
" \"value\": 0.1553588987217306\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"Recall_macro\",\n",
|
|
" \"value\": 0.5\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"F1_macro\",\n",
|
|
" \"value\": 0.23705926481620407\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"F1_weighted\",\n",
|
|
" \"value\": 0.14731706525451424\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"F1_micro\",\n",
|
|
" \"value\": 0.3107177974434612\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"F1_macro\",\n",
|
|
" \"value\": 0.23705926481620407\n",
|
|
" }\n",
|
|
" ]\n",
|
|
" }\n",
|
|
" ]\n",
|
|
"}"
|
|
]
|
|
},
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"import requests\n",
|
|
"from IPython.display import Code\n",
|
|
"\n",
|
|
"endpoint = 'http://senpy.gsi.upm.es/api'\n",
|
|
"res = requests.get(f'{endpoint}/evaluate',\n",
|
|
" params={\"algo\": \"sentiment-vader\",\n",
|
|
" \"dataset\": \"vader,sts\",\n",
|
|
" 'outformat': 'json-ld'\n",
|
|
" })\n",
|
|
"Code(res.text, language='json')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Programmatically (expert)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"A third option is to evaluate plugins manually without launching the server.\n",
|
|
"\n",
|
|
"This option is particularly interesting for advanced users that want faster iterations and evaluation results, and for automation.\n",
|
|
"\n",
|
|
"We would first need an instance of a plugin.\n",
|
|
"In this example we will use the Sentiment140 plugin that is included in every senpy installation:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 22,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from senpy.plugins.sentiment import sentiment140_plugin"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 23,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"s140 = sentiment140_plugin.Sentiment140()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Then, we need to know what datasets are available.\n",
|
|
"We can list all datasets and basic stats (e.g., number of instances and labels used) like this:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 32,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"vader {'instances': 4200, 'labels': [1, -1]}\n",
|
|
"sts {'instances': 4200, 'labels': [1, -1]}\n",
|
|
"imdb_unsup {'instances': 50000, 'labels': [1, -1]}\n",
|
|
"imdb {'instances': 50000, 'labels': [1, -1]}\n",
|
|
"sst {'instances': 11855, 'labels': [1, -1]}\n",
|
|
"multidomain {'instances': 38548, 'labels': [1, -1]}\n",
|
|
"sentiment140 {'instances': 1600000, 'labels': [1, -1]}\n",
|
|
"semeval07 {'instances': 'None', 'labels': [1, -1]}\n",
|
|
"semeval14 {'instances': 7838, 'labels': [1, -1]}\n",
|
|
"pl04 {'instances': 4000, 'labels': [1, -1]}\n",
|
|
"pl05 {'instances': 10662, 'labels': [1, -1]}\n",
|
|
"semeval13 {'instances': 6259, 'labels': [1, -1]}\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from senpy.gsitk_compat import datasets\n",
|
|
"for k, d in datasets.items():\n",
|
|
" print(k, d['stats'])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now, we will evaluate our plugin in one of the smallest datasets, `sts`:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 37,
|
|
"metadata": {
|
|
"scrolled": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[{\n",
|
|
" \"@type\": \"Evaluation\",\n",
|
|
" \"evaluates\": \"endpoint:plugins/sentiment140_0.2\",\n",
|
|
" \"evaluatesOn\": \"sts\",\n",
|
|
" \"metrics\": [\n",
|
|
" {\n",
|
|
" \"@type\": \"Accuracy\",\n",
|
|
" \"value\": 0.872173058013766\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"Precision_macro\",\n",
|
|
" \"value\": 0.9035254323131467\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"Recall_macro\",\n",
|
|
" \"value\": 0.8021249029415483\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"F1_macro\",\n",
|
|
" \"value\": 0.8320673712021136\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"F1_weighted\",\n",
|
|
" \"value\": 0.8631351567604358\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"F1_micro\",\n",
|
|
" \"value\": 0.872173058013766\n",
|
|
" },\n",
|
|
" {\n",
|
|
" \"@type\": \"F1_macro\",\n",
|
|
" \"value\": 0.8320673712021136\n",
|
|
" }\n",
|
|
" ]\n",
|
|
" }]"
|
|
]
|
|
},
|
|
"execution_count": 37,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"s140.evaluate(['sts', ])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"anaconda-cloud": {},
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.3"
|
|
},
|
|
"toc": {
|
|
"colors": {
|
|
"hover_highlight": "#DAA520",
|
|
"running_highlight": "#FF0000",
|
|
"selected_highlight": "#FFD700"
|
|
},
|
|
"moveMenuLeft": true,
|
|
"nav_menu": {
|
|
"height": "68px",
|
|
"width": "252px"
|
|
},
|
|
"navigate_menu": true,
|
|
"number_sections": true,
|
|
"sideBar": true,
|
|
"threshold": 4,
|
|
"toc_cell": false,
|
|
"toc_section_display": "block",
|
|
"toc_window_display": false
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 1
|
|
}
|