How can I perform an action on all occurrences of an element in a list?

Question

I wanted to make texts readable for BERT-embeddings by inserting the [CLS] and [SEP] tokens. I tokenized my text so I have a list with every word and punctuation mark as element, however, I don't know how exactly I can add elements after every occurrence of '.' in my text.

Does anyone know what I can do? Or do you know if there is something that prepares BERT-readable-texts?

Answer 1

I think this answers your question:

https://github.com/google-research/bert#tokenization

As mentioned, you can see how they have done it in run_classifier.py and extract_features.py .

However, you can also accomplish what you want by using regular expressions (regex). In python, this would look something like:

import re

regex = r"[.]"
test_str = "Hello . BERT . Goodbye ."
subst = ". [SEP]"

result = re.sub(regex, subst, test_str)

if result:
    print (result)

How can I perform an action on all occurrences of an element in a list?

Question

1 answers

solution1
0 2019-07-08 16:46:54

How can I perform an action on all occurrences of an element in a list?

Question

1 answers

solution1 0 2019-07-08 16:46:54

solution1
0 2019-07-08 16:46:54