mirror of
https://github.com/gsi-upm/senpy
synced 2025-08-23 18:12:20 +00:00
Macro commit
* Fixed Options for extra_params in UI * Enhanced meta-programming for models * Plugins can be imported from a python file if they're named `senpy_<whatever>.py>` (no need for `.senpy` anymore!) * Add docstings and tests to most plugins * Read plugin description from the docstring * Refactor code to get rid of unnecessary `.senpy`s * Load models, plugins and utils into the main namespace (see __init__.py) * Enhanced plugin development/experience with utils (easy_test, easy_serve) * Fix bug in check_template that wouldn't check objects * Make model defaults a private variable * Add option to list loaded plugins in CLI * Update docs
This commit is contained in:
113
docs/plugins-definition.rst
Normal file
113
docs/plugins-definition.rst
Normal file
@@ -0,0 +1,113 @@
|
||||
Advanced plugin definition
|
||||
--------------------------
|
||||
In addition to finding plugins defined in source code files, senpy can also load a special type of definition file (`.senpy` files).
|
||||
This used to be the only mechanism for loading in earlier versions of senpy.
|
||||
|
||||
The definition file contains basic information
|
||||
|
||||
Lastly, it is also possible to add new plugins programmatically.
|
||||
|
||||
.. contents:: :local:
|
||||
|
||||
What is a plugin?
|
||||
=================
|
||||
|
||||
A plugin is a program that, given a text, will add annotations to it.
|
||||
In practice, a plugin consists of at least two files:
|
||||
|
||||
- Definition file: a `.senpy` file that describes the plugin (e.g. what input parameters it accepts, what emotion model it uses).
|
||||
- Python module: the actual code that will add annotations to each input.
|
||||
|
||||
This separation allows us to deploy plugins that use the same code but employ different parameters.
|
||||
For instance, one could use the same classifier and processing in several plugins, but train with different datasets.
|
||||
This scenario is particularly useful for evaluation purposes.
|
||||
|
||||
The only limitation is that the name of each plugin needs to be unique.
|
||||
|
||||
Definition files
|
||||
================
|
||||
|
||||
The definition file complements and overrides the attributes provided by the plugin.
|
||||
It can be written in YAML or JSON.
|
||||
The most important attributes are:
|
||||
|
||||
* **name**: unique name that senpy will use internally to identify the plugin.
|
||||
* **module**: indicates the module that contains the plugin code, which will be automatically loaded by senpy.
|
||||
* **version**
|
||||
* extra_params: to add parameters to the senpy API when this plugin is requested. Those parameters may be required, and have aliased names. For instance:
|
||||
|
||||
.. code:: yaml
|
||||
|
||||
extra_params:
|
||||
hello_param:
|
||||
aliases: # required
|
||||
- hello_param
|
||||
- hello
|
||||
required: true
|
||||
default: Hi you
|
||||
values:
|
||||
- Hi you
|
||||
- Hello y'all
|
||||
- Howdy
|
||||
|
||||
A complete example:
|
||||
|
||||
.. code:: yaml
|
||||
|
||||
name: <Name of the plugin>
|
||||
module: <Python file>
|
||||
version: 0.1
|
||||
|
||||
And the json equivalent:
|
||||
|
||||
.. code:: json
|
||||
|
||||
{
|
||||
"name": "<Name of the plugin>",
|
||||
"module": "<Python file>",
|
||||
"version": "0.1"
|
||||
}
|
||||
|
||||
|
||||
Example plugin with a definition file
|
||||
=====================================
|
||||
|
||||
In this section, we will implement a basic sentiment analysis plugin.
|
||||
To determine the polarity of each entry, the plugin will compare the length of the string to a threshold.
|
||||
This threshold will be included in the definition file.
|
||||
|
||||
The definition file would look like this:
|
||||
|
||||
.. code:: yaml
|
||||
|
||||
name: helloworld
|
||||
module: helloworld
|
||||
version: 0.0
|
||||
threshold: 10
|
||||
description: Hello World
|
||||
|
||||
Now, in a file named ``helloworld.py``:
|
||||
|
||||
.. code:: python
|
||||
|
||||
#!/bin/env python
|
||||
#helloworld.py
|
||||
|
||||
from senpy import AnalysisPlugin
|
||||
from senpy import Sentiment
|
||||
|
||||
|
||||
class HelloWorld(AnalysisPlugin):
|
||||
|
||||
def analyse_entry(entry, params):
|
||||
'''Basically do nothing with each entry'''
|
||||
|
||||
sentiment = Sentiment()
|
||||
if len(entry.text) < self.threshold:
|
||||
sentiment['marl:hasPolarity'] = 'marl:Positive'
|
||||
else:
|
||||
sentiment['marl:hasPolarity'] = 'marl:Negative'
|
||||
entry.sentiments.append(sentiment)
|
||||
yield entry
|
||||
|
||||
The complete code of the example plugin is available `here <https://lab.cluster.gsi.dit.upm.es/senpy/plugin-prueba>`__.
|
317
docs/plugins.rst
317
docs/plugins.rst
@@ -1,6 +1,8 @@
|
||||
Developing new plugins
|
||||
----------------------
|
||||
This document describes how to develop a new analysis plugin. For an example of conversion plugins, see :doc:`conversion`.
|
||||
This document contains the minimum to get you started with developing new analysis plugin.
|
||||
For an example of conversion plugins, see :doc:`conversion`.
|
||||
For a description of definition files, see :doc:`plugins-definition`.
|
||||
|
||||
A more step-by-step tutorial with slides is available `here <https://lab.cluster.gsi.dit.upm.es/senpy/senpy-tutorial>`__
|
||||
|
||||
@@ -9,83 +11,29 @@ A more step-by-step tutorial with slides is available `here <https://lab.cluster
|
||||
What is a plugin?
|
||||
=================
|
||||
|
||||
A plugin is a program that, given a text, will add annotations to it.
|
||||
In practice, a plugin consists of at least two files:
|
||||
|
||||
- Definition file: a `.senpy` file that describes the plugin (e.g. what input parameters it accepts, what emotion model it uses).
|
||||
- Python module: the actual code that will add annotations to each input.
|
||||
|
||||
This separation allows us to deploy plugins that use the same code but employ different parameters.
|
||||
For instance, one could use the same classifier and processing in several plugins, but train with different datasets.
|
||||
This scenario is particularly useful for evaluation purposes.
|
||||
|
||||
The only limitation is that the name of each plugin needs to be unique.
|
||||
|
||||
Plugin Definition files
|
||||
=======================
|
||||
|
||||
The definition file contains all the attributes of the plugin, and can be written in YAML or JSON.
|
||||
When the server is launched, it will recursively search for definition files in the plugin folder (the current folder, by default).
|
||||
The most important attributes are:
|
||||
|
||||
* **name**: unique name that senpy will use internally to identify the plugin.
|
||||
* **module**: indicates the module that contains the plugin code, which will be automatically loaded by senpy.
|
||||
* **version**
|
||||
* extra_params: to add parameters to the senpy API when this plugin is requested. Those parameters may be required, and have aliased names. For instance:
|
||||
|
||||
.. code:: yaml
|
||||
|
||||
extra_params:
|
||||
hello_param:
|
||||
aliases: # required
|
||||
- hello_param
|
||||
- hello
|
||||
required: true
|
||||
default: Hi you
|
||||
values:
|
||||
- Hi you
|
||||
- Hello y'all
|
||||
- Howdy
|
||||
|
||||
Parameter validation will fail if a required parameter without a default has not been provided, or if the definition includes a set of values and the provided one does not match one of them.
|
||||
A plugin is a python object that can process entries. Given an entry, it will modify it, add annotations to it, or generate new entries.
|
||||
|
||||
|
||||
A complete example:
|
||||
|
||||
.. code:: yaml
|
||||
|
||||
name: <Name of the plugin>
|
||||
module: <Python file>
|
||||
version: 0.1
|
||||
|
||||
And the json equivalent:
|
||||
|
||||
.. code:: json
|
||||
|
||||
{
|
||||
"name": "<Name of the plugin>",
|
||||
"module": "<Python file>",
|
||||
"version": "0.1"
|
||||
}
|
||||
|
||||
|
||||
Plugins Code
|
||||
============
|
||||
|
||||
The basic methods in a plugin are:
|
||||
|
||||
* __init__
|
||||
* activate: used to load memory-hungry resources
|
||||
* deactivate: used to free up resources
|
||||
* analyse_entry: called in every user requests. It takes two parameters: ``Entry``, the entry object, and ``params``, the parameters supplied by the user. It should yield one or more ``Entry`` objects.
|
||||
|
||||
Plugins are loaded asynchronously, so don't worry if the activate method takes too long. The plugin will be marked as activated once it is finished executing the method.
|
||||
|
||||
Entries
|
||||
=======
|
||||
What is an entry?
|
||||
=================
|
||||
|
||||
Entries are objects that can be annotated.
|
||||
In general, they will be a piece of text.
|
||||
By default, entries are `NIF contexts <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html>`_ represented in JSON-LD format.
|
||||
It is a dictionary/JSON object that looks like this:
|
||||
|
||||
.. code:: python
|
||||
|
||||
{
|
||||
"@id": "<unique identifier or blank node name>",
|
||||
"nif:isString": "input text",
|
||||
"sentiments": [ {
|
||||
...
|
||||
}
|
||||
],
|
||||
...
|
||||
}
|
||||
|
||||
Annotations are added to the object like this:
|
||||
|
||||
.. code:: python
|
||||
@@ -100,96 +48,111 @@ The value may be any valid JSON-LD dictionary.
|
||||
For simplicity, senpy includes a series of models by default in the ``senpy.models`` module.
|
||||
|
||||
|
||||
Example plugin
|
||||
==============
|
||||
What are annotations?
|
||||
=====================
|
||||
They are objects just like entries.
|
||||
Senpy ships with several default annotations, including: ``Sentiment``, ``Emotion``, ``EmotionSet``...jk bb
|
||||
|
||||
In this section, we will implement a basic sentiment analysis plugin.
|
||||
To determine the polarity of each entry, the plugin will compare the length of the string to a threshold.
|
||||
This threshold will be included in the definition file.
|
||||
|
||||
The definition file would look like this:
|
||||
What's a plugin made of?
|
||||
========================
|
||||
|
||||
.. code:: yaml
|
||||
When receiving a query, senpy selects what plugin or plugins should process each entry, and in what order.
|
||||
It also makes sure the every entry and the parameters provided by the user meet the plugin requirements.
|
||||
|
||||
name: helloworld
|
||||
module: helloworld
|
||||
version: 0.0
|
||||
threshold: 10
|
||||
description: Hello World
|
||||
Hence, two parts are necessary: 1) the code that will process the entry, and 2) some attributes and metadata that will tell senpy how to interact with the plugin.
|
||||
|
||||
Now, in a file named ``helloworld.py``:
|
||||
In practice, this is what a plugin looks like, tests included:
|
||||
|
||||
|
||||
.. literalinclude:: ../senpy/plugins/example/rand_plugin.py
|
||||
:emphasize-lines: 5-11
|
||||
:language: python
|
||||
|
||||
|
||||
The lines highlighted contain some information about the plugin.
|
||||
In particular, the following information is mandatory:
|
||||
|
||||
* A unique name for the class. In our example, Rand.
|
||||
* The subclass/type of plugin. This is typically either `SentimentPlugin` or `EmotionPlugin`. However, new types of plugin can be created for different annotations. The only requirement is that these new types inherit from `senpy.Analysis`
|
||||
* A description of the plugin. This can be done simply by adding a doc to the class.
|
||||
* A version, which should get updated.
|
||||
* An author name.
|
||||
|
||||
|
||||
Plugins Code
|
||||
============
|
||||
|
||||
The basic methods in a plugin are:
|
||||
|
||||
* analyse_entry: called in every user requests. It takes two parameters: ``Entry``, the entry object, and ``params``, the parameters supplied by the user. It should yield one or more ``Entry`` objects.
|
||||
* activate: used to load memory-hungry resources. For instance, to train a classifier.
|
||||
* deactivate: used to free up resources when the plugin is no longer needed.
|
||||
|
||||
Plugins are loaded asynchronously, so don't worry if the activate method takes too long. The plugin will be marked as activated once it is finished executing the method.
|
||||
|
||||
|
||||
How does senpy find modules?
|
||||
============================
|
||||
|
||||
Senpy looks for files of two types:
|
||||
|
||||
* Python files of the form `senpy_<NAME>.py` or `<NAME>_plugin.py`. In these files, it will look for: 1) Instances that inherit from `senpy.Plugin`, or subclasses of `senpy.Plugin` that can be initialized without a configuration file. i.e. classes that contain all the required attributes for a plugin.
|
||||
* Plugin definition files (see :doc:`advanced-plugins`)
|
||||
|
||||
Defining additional parameters
|
||||
==============================
|
||||
|
||||
Your plugin may ask for additional parameters from the users of the service by using the attribute ``extra_params`` in your plugin definition.
|
||||
It takes a dictionary, where the keys are the name of the argument/parameter, and the value has the following fields:
|
||||
|
||||
* aliases: the different names which can be used in the request to use the parameter.
|
||||
* required: if set to true, users need to provide this parameter unless a default is set.
|
||||
* options: the different acceptable values of the parameter (i.e. an enum). If set, the value provided must match one of the options.
|
||||
* default: the default value of the parameter, if none is provided in the request.
|
||||
|
||||
.. code:: python
|
||||
|
||||
#!/bin/env python
|
||||
#helloworld.py
|
||||
|
||||
from senpy.plugins import AnalysisPlugin
|
||||
from senpy.models import Sentiment
|
||||
"extra_params":{
|
||||
"language": {
|
||||
"aliases": ["language", "lang", "l"],
|
||||
"required": True,
|
||||
"options": ["es", "en"],
|
||||
"default": "es"
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
class HelloWorld(AnalysisPlugin):
|
||||
|
||||
def analyse_entry(entry, params):
|
||||
'''Basically do nothing with each entry'''
|
||||
|
||||
sentiment = Sentiment()
|
||||
if len(entry.text) < self.threshold:
|
||||
sentiment['marl:hasPolarity'] = 'marl:Positive'
|
||||
else:
|
||||
sentiment['marl:hasPolarity'] = 'marl:Negative'
|
||||
entry.sentiments.append(sentiment)
|
||||
yield entry
|
||||
|
||||
The complete code of the example plugin is available `here <https://lab.cluster.gsi.dit.upm.es/senpy/plugin-prueba>`__.
|
||||
|
||||
Loading data and files
|
||||
======================
|
||||
|
||||
Most plugins will need access to files (dictionaries, lexicons, etc.).
|
||||
It is good practice to specify the paths of these files in the plugin configuration, so the same code can be reused with different resources.
|
||||
These files are usually heavy or under a license that does not allow redistribution.
|
||||
For this reason, senpy has a `data_folder` that is separated from the source files.
|
||||
The location of this folder is controlled programmatically or by setting the `SENPY_DATA` environment variable.
|
||||
|
||||
Plugins have a convenience function `self.open` which will automatically prepend the data folder to relative paths:
|
||||
|
||||
|
||||
.. code:: yaml
|
||||
.. code:: python
|
||||
|
||||
import os
|
||||
|
||||
|
||||
class PluginWithResources(AnalysisPlugin):
|
||||
file_in_data = <FILE PATH>
|
||||
file_in_sources = <FILE PATH>
|
||||
|
||||
def activate(self):
|
||||
with self.open(self.file_in_data) as f:
|
||||
self._classifier = train_from_file(f)
|
||||
file_in_source = os.path.join(self.get_folder(), self.file_in_sources)
|
||||
with self.open(file_in_source) as f:
|
||||
pass
|
||||
|
||||
name: dictworld
|
||||
module: dictworld
|
||||
dictionary_path: <PATH OF THE FILE>
|
||||
|
||||
The path can be either absolute, or relative.
|
||||
|
||||
From absolute paths
|
||||
???????????????????
|
||||
|
||||
Absolute paths (such as ``/data/dictionary.csv`` are straightfoward:
|
||||
|
||||
.. code:: python
|
||||
|
||||
with open(os.path.join(self.dictionary_path) as f:
|
||||
...
|
||||
|
||||
From relative paths
|
||||
???????????????????
|
||||
Since plugins are loading dynamically, relative paths will refer to the current working directory.
|
||||
Instead, what you usually want is to load files *relative to the plugin source folder*, like so:
|
||||
|
||||
|
||||
::
|
||||
|
||||
.
|
||||
..
|
||||
plugin.senpy
|
||||
plugin.py
|
||||
dictionary.csv
|
||||
|
||||
For this, we need to first get the path of your source folder first, like so:
|
||||
|
||||
.. code:: python
|
||||
|
||||
import os
|
||||
root = os.path.realpath(__file__)
|
||||
with open(os.path.join(root, self.dictionary_path) as f:
|
||||
...
|
||||
It is good practice to specify the paths of these files in the plugin configuration, so the same code can be reused with different resources.
|
||||
|
||||
|
||||
Docker image
|
||||
@@ -199,8 +162,17 @@ Add the following dockerfile to your project to generate a docker image with you
|
||||
|
||||
.. code:: dockerfile
|
||||
|
||||
FROM gsiupm/senpy:0.8.8
|
||||
FROM gsiupm/senpy
|
||||
|
||||
Once you make sure your plugin works with a specific version of senpy, modify that file to make sure your build will work even if senpy gets updated.
|
||||
e.g.:
|
||||
|
||||
|
||||
.. code:: dockerfile
|
||||
|
||||
FROM gsiupm/senpy:1.0.1
|
||||
|
||||
|
||||
This will copy your source folder to the image, and install all dependencies.
|
||||
Now, to build an image:
|
||||
|
||||
@@ -215,7 +187,7 @@ And you can run it with:
|
||||
docker run -p 5000:5000 gsiupm/exampleplugin
|
||||
|
||||
|
||||
If the plugin non-source files (:ref:`loading data and files`), the recommended way is to use absolute paths.
|
||||
If the plugin uses non-source files (:ref:`loading data and files`), the recommended way is to use `SENPY_DATA` folder.
|
||||
Data can then be mounted in the container or added to the image.
|
||||
The former is recommended for open source plugins with licensed resources, whereas the latter is the most convenient and can be used for private images.
|
||||
|
||||
@@ -229,7 +201,7 @@ Adding data to the image:
|
||||
|
||||
.. code:: dockerfile
|
||||
|
||||
FROM gsiupm/senpy:0.8.8
|
||||
FROM gsiupm/senpy:1.0.1
|
||||
COPY data /
|
||||
|
||||
F.A.Q.
|
||||
@@ -245,7 +217,7 @@ Why does the analyse function yield instead of return?
|
||||
??????????????????????????????????????????????????????
|
||||
|
||||
This is so that plugins may add new entries to the response or filter some of them.
|
||||
For instance, a `context detection` plugin may add a new entry for each context in the original entry.
|
||||
For instance, a chunker may split one entry into several.
|
||||
On the other hand, a conversion plugin may leave out those entries that do not contain relevant information.
|
||||
|
||||
|
||||
@@ -275,11 +247,13 @@ Training a classifier can be time time consuming. To avoid running the training
|
||||
def deactivate(self):
|
||||
self.close()
|
||||
|
||||
You can specify a 'shelf_file' in your .senpy file. By default the ShelfMixin creates a file based on the plugin name and stores it in that plugin's folder.
|
||||
|
||||
By default the ShelfMixin creates a file based on the plugin name and stores it in that plugin's folder.
|
||||
However, you can manually specify a 'shelf_file' in your .senpy file.
|
||||
|
||||
Shelves may get corrupted if the plugin exists unexpectedly.
|
||||
A corrupt shelf prevents the plugin from loading.
|
||||
If you do not care about the pickle, you can force your plugin to remove the corrupted file and load anyway, set the 'force_shelf' to True in your .senpy file.
|
||||
If you do not care about the data in the shelf, you can force your plugin to remove the corrupted file and load anyway, set the 'force_shelf' to True in your plugin and start it again.
|
||||
|
||||
How can I turn an external service into a plugin?
|
||||
?????????????????????????????????????????????????
|
||||
@@ -313,50 +287,11 @@ This example ilustrate how to implement a plugin that accesses the Sentiment140
|
||||
prefix=p,
|
||||
marl__hasPolarity=polarity,
|
||||
marl__polarityValue=polarity_value)
|
||||
sentiment.prov__wasGeneratedBy = self.id
|
||||
sentiment.prov(self)
|
||||
entry.sentiments.append(sentiment)
|
||||
yield entry
|
||||
|
||||
|
||||
Can my plugin require additional parameters from the user?
|
||||
??????????????????????????????????????????????????????????
|
||||
|
||||
You can add extra parameters in the definition file under the attribute ``extra_params``.
|
||||
It takes a dictionary, where the keys are the name of the argument/parameter, and the value has the following fields:
|
||||
|
||||
* aliases: the different names which can be used in the request to use the parameter.
|
||||
* required: if set to true, users need to provide this parameter unless a default is set.
|
||||
* options: the different acceptable values of the parameter (i.e. an enum). If set, the value provided must match one of the options.
|
||||
* default: the default value of the parameter, if none is provided in the request.
|
||||
|
||||
.. code:: python
|
||||
|
||||
extra_params
|
||||
language:
|
||||
aliases:
|
||||
- language
|
||||
- lang
|
||||
- l
|
||||
required: true,
|
||||
options:
|
||||
- es
|
||||
- en
|
||||
default: es
|
||||
|
||||
This example shows how to introduce a parameter associated with language.
|
||||
The extraction of this paremeter is used in the analyse method of the Plugin interface.
|
||||
|
||||
.. code:: python
|
||||
|
||||
lang = params.get("language")
|
||||
|
||||
Where can I set up variables for using them in my plugin?
|
||||
?????????????????????????????????????????????????????????
|
||||
|
||||
You can add these variables in the definition file with the structure of attribute-value pairs.
|
||||
|
||||
Every field added to the definition file is available to the plugin instance.
|
||||
|
||||
Can I activate a DEBUG mode for my plugin?
|
||||
???????????????????????????????????????????
|
||||
|
||||
@@ -371,7 +306,7 @@ Additionally, with the ``--pdb`` option you will be dropped into a pdb post mort
|
||||
|
||||
.. code:: bash
|
||||
|
||||
senpy --pdb
|
||||
python -m pdb yourplugin.py
|
||||
|
||||
Where can I find more code examples?
|
||||
????????????????????????????????????
|
||||
|
@@ -7,21 +7,29 @@ The senpy server is launched via the `senpy` command:
|
||||
|
||||
usage: senpy [-h] [--level logging_level] [--debug] [--default-plugins]
|
||||
[--host HOST] [--port PORT] [--plugins-folder PLUGINS_FOLDER]
|
||||
[--only-install]
|
||||
[--only-install] [--only-list] [--data-folder DATA_FOLDER]
|
||||
[--threaded] [--version]
|
||||
|
||||
Run a Senpy server
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
--level logging_level, -l logging_level
|
||||
-h, --help show this help message and exit
|
||||
--level logging_level, -l logging_level
|
||||
Logging level
|
||||
--debug, -d Run the application in debug mode
|
||||
--default-plugins Load the default plugins
|
||||
--host HOST Use 0.0.0.0 to accept requests from any host.
|
||||
--port PORT, -p PORT Port to listen on.
|
||||
--plugins-folder PLUGINS_FOLDER, -f PLUGINS_FOLDER
|
||||
--debug, -d Run the application in debug mode
|
||||
--default-plugins Load the default plugins
|
||||
--host HOST Use 0.0.0.0 to accept requests from any host.
|
||||
--port PORT, -p PORT Port to listen on.
|
||||
--plugins-folder PLUGINS_FOLDER, -f PLUGINS_FOLDER
|
||||
Where to look for plugins.
|
||||
--only-install, -i Do not run a server, only install plugin dependencies
|
||||
--only-install, -i Do not run a server, only install plugin dependencies
|
||||
--only-list, --list Do not run a server, only list plugins found
|
||||
--data-folder DATA_FOLDER, --data DATA_FOLDER
|
||||
Where to look for data. It be set with the SENPY_DATA
|
||||
environment variable as well.
|
||||
--threaded Run a threaded server
|
||||
--version, -v Output the senpy version and exit
|
||||
|
||||
|
||||
|
||||
When launched, the server will recursively look for plugins in the specified plugins folder (the current working directory by default).
|
||||
|
Reference in New Issue
Block a user