What do I need to do to translate this list?

Question

I have been using deeppavlov's named entity recognition model, however, it returns data in this format: [[[tokens], [ner_tags]]]

Example:

Raw text- John Doe at Burger King on Thursday

Return:

[[['john', 'doe', 'at', 'burger', 'king', 'on', 'thursday'], 
  ['B-PERSON, 'I-PERSON', 'O', B-ORG, I-ORG, 'O', 'B-DATE]]]

Desired:

[['john doe', 'PERSON'], ['burger king', ORG], [thursday, DATE]]

The 'B-' prefix indicates the beginning of an entity, while 'I-' indicates the 'inside' of the entity. How do I manipulate the lists to provide the desired output

Answer 1

You could use the zip method.

rs = [[['john', 'doe', 'at', 'burger', 'king', 'on', 'thursday'], 
       ['B-PERSON, 'I-PERSON', 'O', B-ORG, I-ORG, 'O', 'B-DATE]]]
words, kinds = rs[0]
classes = [[word, kind] for word, kind in zip(words, kinds) if kind != 'O']

Answer 2

Use itertools.groupby :

from itertools import groupby

res = []
for k, g in groupby(zip(*result[0]), key=lambda x:x[1].split('-')[-1]):
    if k != 'O':
        res.append([' '.join(x[0] for x in g), k])
res

Output:

[['john doe', 'PERSON'], ['burger king', 'ORG'], ['thursday', 'DATE']]

You can make this one-liner:

[[' '.join(x[0] for x in g), k] for k, g in groupby(zip(*result[0]), key=lambda x:x[1].split('-')[-1]) if k != 'O']

What do I need to do to translate this list?

Question

2 answers

solution1
0 2019-08-13 02:49:23

solution2
0 2019-08-13 04:07:39

What do I need to do to translate this list?

Question

2 answers

solution1 0 2019-08-13 02:49:23

solution2 0 2019-08-13 04:07:39

solution1
0 2019-08-13 02:49:23

solution2
0 2019-08-13 04:07:39