mirror of
https://github.com/gsi-upm/senpy
synced 2024-11-24 09:02:28 +00:00
21a5a3f201
* Fixed Options for extra_params in UI * Enhanced meta-programming for models * Plugins can be imported from a python file if they're named `senpy_<whatever>.py>` (no need for `.senpy` anymore!) * Add docstings and tests to most plugins * Read plugin description from the docstring * Refactor code to get rid of unnecessary `.senpy`s * Load models, plugins and utils into the main namespace (see __init__.py) * Enhanced plugin development/experience with utils (easy_test, easy_serve) * Fix bug in check_template that wouldn't check objects * Make model defaults a private variable * Add option to list loaded plugins in CLI * Update docs
315 lines
11 KiB
ReStructuredText
315 lines
11 KiB
ReStructuredText
Developing new plugins
|
|
----------------------
|
|
This document contains the minimum to get you started with developing new analysis plugin.
|
|
For an example of conversion plugins, see :doc:`conversion`.
|
|
For a description of definition files, see :doc:`plugins-definition`.
|
|
|
|
A more step-by-step tutorial with slides is available `here <https://lab.cluster.gsi.dit.upm.es/senpy/senpy-tutorial>`__
|
|
|
|
.. contents:: :local:
|
|
|
|
What is a plugin?
|
|
=================
|
|
|
|
A plugin is a python object that can process entries. Given an entry, it will modify it, add annotations to it, or generate new entries.
|
|
|
|
|
|
What is an entry?
|
|
=================
|
|
|
|
Entries are objects that can be annotated.
|
|
In general, they will be a piece of text.
|
|
By default, entries are `NIF contexts <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.html>`_ represented in JSON-LD format.
|
|
It is a dictionary/JSON object that looks like this:
|
|
|
|
.. code:: python
|
|
|
|
{
|
|
"@id": "<unique identifier or blank node name>",
|
|
"nif:isString": "input text",
|
|
"sentiments": [ {
|
|
...
|
|
}
|
|
],
|
|
...
|
|
}
|
|
|
|
Annotations are added to the object like this:
|
|
|
|
.. code:: python
|
|
|
|
entry = Entry()
|
|
entry.vocabulary__annotationName = 'myvalue'
|
|
entry['vocabulary:annotationName'] = 'myvalue'
|
|
entry['annotationNameURI'] = 'myvalue'
|
|
|
|
Where vocabulary is one of the prefixes defined in the default senpy context, and annotationURI is a full URI.
|
|
The value may be any valid JSON-LD dictionary.
|
|
For simplicity, senpy includes a series of models by default in the ``senpy.models`` module.
|
|
|
|
|
|
What are annotations?
|
|
=====================
|
|
They are objects just like entries.
|
|
Senpy ships with several default annotations, including: ``Sentiment``, ``Emotion``, ``EmotionSet``...jk bb
|
|
|
|
|
|
What's a plugin made of?
|
|
========================
|
|
|
|
When receiving a query, senpy selects what plugin or plugins should process each entry, and in what order.
|
|
It also makes sure the every entry and the parameters provided by the user meet the plugin requirements.
|
|
|
|
Hence, two parts are necessary: 1) the code that will process the entry, and 2) some attributes and metadata that will tell senpy how to interact with the plugin.
|
|
|
|
In practice, this is what a plugin looks like, tests included:
|
|
|
|
|
|
.. literalinclude:: ../senpy/plugins/example/rand_plugin.py
|
|
:emphasize-lines: 5-11
|
|
:language: python
|
|
|
|
|
|
The lines highlighted contain some information about the plugin.
|
|
In particular, the following information is mandatory:
|
|
|
|
* A unique name for the class. In our example, Rand.
|
|
* The subclass/type of plugin. This is typically either `SentimentPlugin` or `EmotionPlugin`. However, new types of plugin can be created for different annotations. The only requirement is that these new types inherit from `senpy.Analysis`
|
|
* A description of the plugin. This can be done simply by adding a doc to the class.
|
|
* A version, which should get updated.
|
|
* An author name.
|
|
|
|
|
|
Plugins Code
|
|
============
|
|
|
|
The basic methods in a plugin are:
|
|
|
|
* analyse_entry: called in every user requests. It takes two parameters: ``Entry``, the entry object, and ``params``, the parameters supplied by the user. It should yield one or more ``Entry`` objects.
|
|
* activate: used to load memory-hungry resources. For instance, to train a classifier.
|
|
* deactivate: used to free up resources when the plugin is no longer needed.
|
|
|
|
Plugins are loaded asynchronously, so don't worry if the activate method takes too long. The plugin will be marked as activated once it is finished executing the method.
|
|
|
|
|
|
How does senpy find modules?
|
|
============================
|
|
|
|
Senpy looks for files of two types:
|
|
|
|
* Python files of the form `senpy_<NAME>.py` or `<NAME>_plugin.py`. In these files, it will look for: 1) Instances that inherit from `senpy.Plugin`, or subclasses of `senpy.Plugin` that can be initialized without a configuration file. i.e. classes that contain all the required attributes for a plugin.
|
|
* Plugin definition files (see :doc:`advanced-plugins`)
|
|
|
|
Defining additional parameters
|
|
==============================
|
|
|
|
Your plugin may ask for additional parameters from the users of the service by using the attribute ``extra_params`` in your plugin definition.
|
|
It takes a dictionary, where the keys are the name of the argument/parameter, and the value has the following fields:
|
|
|
|
* aliases: the different names which can be used in the request to use the parameter.
|
|
* required: if set to true, users need to provide this parameter unless a default is set.
|
|
* options: the different acceptable values of the parameter (i.e. an enum). If set, the value provided must match one of the options.
|
|
* default: the default value of the parameter, if none is provided in the request.
|
|
|
|
.. code:: python
|
|
|
|
"extra_params":{
|
|
"language": {
|
|
"aliases": ["language", "lang", "l"],
|
|
"required": True,
|
|
"options": ["es", "en"],
|
|
"default": "es"
|
|
}
|
|
}
|
|
|
|
|
|
|
|
Loading data and files
|
|
======================
|
|
|
|
Most plugins will need access to files (dictionaries, lexicons, etc.).
|
|
These files are usually heavy or under a license that does not allow redistribution.
|
|
For this reason, senpy has a `data_folder` that is separated from the source files.
|
|
The location of this folder is controlled programmatically or by setting the `SENPY_DATA` environment variable.
|
|
|
|
Plugins have a convenience function `self.open` which will automatically prepend the data folder to relative paths:
|
|
|
|
|
|
.. code:: python
|
|
|
|
import os
|
|
|
|
|
|
class PluginWithResources(AnalysisPlugin):
|
|
file_in_data = <FILE PATH>
|
|
file_in_sources = <FILE PATH>
|
|
|
|
def activate(self):
|
|
with self.open(self.file_in_data) as f:
|
|
self._classifier = train_from_file(f)
|
|
file_in_source = os.path.join(self.get_folder(), self.file_in_sources)
|
|
with self.open(file_in_source) as f:
|
|
pass
|
|
|
|
|
|
It is good practice to specify the paths of these files in the plugin configuration, so the same code can be reused with different resources.
|
|
|
|
|
|
Docker image
|
|
============
|
|
|
|
Add the following dockerfile to your project to generate a docker image with your plugin:
|
|
|
|
.. code:: dockerfile
|
|
|
|
FROM gsiupm/senpy
|
|
|
|
Once you make sure your plugin works with a specific version of senpy, modify that file to make sure your build will work even if senpy gets updated.
|
|
e.g.:
|
|
|
|
|
|
.. code:: dockerfile
|
|
|
|
FROM gsiupm/senpy:1.0.1
|
|
|
|
|
|
This will copy your source folder to the image, and install all dependencies.
|
|
Now, to build an image:
|
|
|
|
.. code:: shell
|
|
|
|
docker build . -t gsiupm/exampleplugin
|
|
|
|
And you can run it with:
|
|
|
|
.. code:: shell
|
|
|
|
docker run -p 5000:5000 gsiupm/exampleplugin
|
|
|
|
|
|
If the plugin uses non-source files (:ref:`loading data and files`), the recommended way is to use `SENPY_DATA` folder.
|
|
Data can then be mounted in the container or added to the image.
|
|
The former is recommended for open source plugins with licensed resources, whereas the latter is the most convenient and can be used for private images.
|
|
|
|
Mounting data:
|
|
|
|
.. code:: bash
|
|
|
|
docker run -v $PWD/data:/data gsiupm/exampleplugin
|
|
|
|
Adding data to the image:
|
|
|
|
.. code:: dockerfile
|
|
|
|
FROM gsiupm/senpy:1.0.1
|
|
COPY data /
|
|
|
|
F.A.Q.
|
|
======
|
|
What annotations can I use?
|
|
???????????????????????????
|
|
|
|
You can add almost any annotation to an entry.
|
|
The most common use cases are covered in the :doc:`apischema`.
|
|
|
|
|
|
Why does the analyse function yield instead of return?
|
|
??????????????????????????????????????????????????????
|
|
|
|
This is so that plugins may add new entries to the response or filter some of them.
|
|
For instance, a chunker may split one entry into several.
|
|
On the other hand, a conversion plugin may leave out those entries that do not contain relevant information.
|
|
|
|
|
|
If I'm using a classifier, where should I train it?
|
|
???????????????????????????????????????????????????
|
|
|
|
Training a classifier can be time time consuming. To avoid running the training unnecessarily, you can use ShelfMixin to store the classifier. For instance:
|
|
|
|
.. code:: python
|
|
|
|
from senpy.plugins import ShelfMixin, AnalysisPlugin
|
|
|
|
class MyPlugin(ShelfMixin, AnalysisPlugin):
|
|
def train(self):
|
|
''' Code to train the classifier
|
|
'''
|
|
# Here goes the code
|
|
# ...
|
|
return classifier
|
|
|
|
def activate(self):
|
|
if 'classifier' not in self.sh:
|
|
classifier = self.train()
|
|
self.sh['classifier'] = classifier
|
|
self.classifier = self.sh['classifier']
|
|
|
|
def deactivate(self):
|
|
self.close()
|
|
|
|
|
|
By default the ShelfMixin creates a file based on the plugin name and stores it in that plugin's folder.
|
|
However, you can manually specify a 'shelf_file' in your .senpy file.
|
|
|
|
Shelves may get corrupted if the plugin exists unexpectedly.
|
|
A corrupt shelf prevents the plugin from loading.
|
|
If you do not care about the data in the shelf, you can force your plugin to remove the corrupted file and load anyway, set the 'force_shelf' to True in your plugin and start it again.
|
|
|
|
How can I turn an external service into a plugin?
|
|
?????????????????????????????????????????????????
|
|
|
|
This example ilustrate how to implement a plugin that accesses the Sentiment140 service.
|
|
|
|
.. code:: python
|
|
|
|
class Sentiment140Plugin(SentimentPlugin):
|
|
def analyse_entry(self, entry, params):
|
|
text = entry.text
|
|
lang = params.get("language", "auto")
|
|
res = requests.post("http://www.sentiment140.com/api/bulkClassifyJson",
|
|
json.dumps({"language": lang,
|
|
"data": [{"text": text}]
|
|
}
|
|
)
|
|
)
|
|
|
|
p = params.get("prefix", None)
|
|
polarity_value = self.maxPolarityValue*int(res.json()["data"][0]
|
|
["polarity"]) * 0.25
|
|
polarity = "marl:Neutral"
|
|
neutral_value = self.maxPolarityValue / 2.0
|
|
if polarity_value > neutral_value:
|
|
polarity = "marl:Positive"
|
|
elif polarity_value < neutral_value:
|
|
polarity = "marl:Negative"
|
|
|
|
sentiment = Sentiment(id="Sentiment0",
|
|
prefix=p,
|
|
marl__hasPolarity=polarity,
|
|
marl__polarityValue=polarity_value)
|
|
sentiment.prov(self)
|
|
entry.sentiments.append(sentiment)
|
|
yield entry
|
|
|
|
|
|
Can I activate a DEBUG mode for my plugin?
|
|
???????????????????????????????????????????
|
|
|
|
You can activate the DEBUG mode by the command-line tool using the option -d.
|
|
|
|
.. code:: bash
|
|
|
|
senpy -d
|
|
|
|
|
|
Additionally, with the ``--pdb`` option you will be dropped into a pdb post mortem shell if an exception is raised.
|
|
|
|
.. code:: bash
|
|
|
|
python -m pdb yourplugin.py
|
|
|
|
Where can I find more code examples?
|
|
????????????????????????????????????
|
|
|
|
See: `<http://github.com/gsi-upm/senpy-plugins-community>`_.
|