简体   繁体   中英

How can I perform an action on all occurrences of an element in a list?

I wanted to make texts readable for BERT-embeddings by inserting the [CLS] and [SEP] tokens. I tokenized my text so I have a list with every word and punctuation mark as element, however, I don't know how exactly I can add elements after every occurrence of '.' in my text.

Does anyone know what I can do? Or do you know if there is something that prepares BERT-readable-texts?

I think this answers your question:

https://github.com/google-research/bert#tokenization

As mentioned, you can see how they have done it in run_classifier.py and extract_features.py .

However, you can also accomplish what you want by using regular expressions (regex). In python, this would look something like:

import re

regex = r"[.]"
test_str = "Hello . BERT . Goodbye ."
subst = ". [SEP]"

result = re.sub(regex, subst, test_str)

if result:
    print (result)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM