简体   繁体   中英

Auto labelling in doccano

I'm getting into manual annotation for NLP, found a cool tool named DOCCANO for annotation which has auto labelling feature. If anyone know how to setup auto annotation using custom REST api request.

Example for Sequence Labeling project:

Let's say we have custom API that requires text to be classified send in a request body, like this:

{
    "text": "example text"
}

When setting up Auto Labeling in Doccano, in the second screen Set parameters , set up your API url and other optional parameters (eg authentication headers), then use text variable as shown in the screenshot, using {{ text }} format. Every time the auto labeling API is called, it will send text in place of this variable.

使用文本变量

In the next step, Set a template , we need to specify a mapping template for mapping the API response to doccano specific format. In this case, we could use this:

映射模板

It uses jinja format, here is mapping template from the screenshot:

[
    {% for entity in input %}
        {
            "start_offset": {{ entity.start_offset }},
            "end_offset": {{ entity.end_offset}},
            "label": "P-B"
        }{% if not loop.last %},{% endif %}
    {% endfor %}
]

Finally, in the last step, we just map labels from the previous step to labels that were created before in the doccano project. This should be straightforward.

映射

Then just click Finish and we are good to go.

To enable auto-labeling, open any datapoint in the Dataset tab and toggle the switch in the window that shows up after clicking Auto Labeling button. From now on, every time you open an un-approved data row, it will automatically use auto-labeling to label the text for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM