I need to read N lines like:
"word1" "word2"
"word1" "word2"
.
.
.
"word1" "word2"
And then read one big line T, where |T| <= 11000000, I mean it may consists out of about 11*10^6 letters. My task is to replace all words1 in T by words2. But the problem is that I have only 10Mb of memory and I think one of the solutions may consists of reading a line T word by word and printing, I mean using O(1) memory. But I have no glue how to make it on python. Thanks in advance)
If you know which words you want to replace you can do this:
newT = T.replace("word1","word2")
source of code example: https://www.guru99.com/tokenize-words-sentences-nltk.html
from nltk.tokenize import word_tokenize
text = "God is Great! I won a lottery."
print(word_tokenize(text))
Output: ['God', 'is', 'Great', '!', 'I', 'won', 'a', 'lottery', '.']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.