如何在 python 中逐字读取？

Question

I need to read N lines like:我需要阅读 N 行，例如：

"word1" "word2"
"word1" "word2"
      .
      .
      .
"word1" "word2"

And then read one big line T, where |T|然后读一大行 T，其中 |T| <= 11000000, I mean it may consists out of about 11*10^6 letters. <= 11000000，我的意思是它可能由大约 11*10^6 个字母组成。 My task is to replace all words1 in T by words2.我的任务是用words2替换T中的所有words1。 But the problem is that I have only 10Mb of memory and I think one of the solutions may consists of reading a line T word by word and printing, I mean using O(1) memory.但问题是我只有 10Mb 的 memory，我认为其中一种解决方案可能包括逐字读取 T 行并打印，我的意思是使用 O(1) memory。 But I have no glue how to make it on python.但我没有胶水如何在 python 上制作它。 Thanks in advance)提前致谢）

Answer 1

I am not sure about the memory issue but I hope this will help我不确定 memory 问题，但我希望这会有所帮助

Replacing words替换单词

If you know which words you want to replace you can do this:如果您知道要替换哪些单词，可以这样做：

newT = T.replace("word1","word2")

Dividing sentence in to words将句子分成单词

source of code example: https://www.guru99.com/tokenize-words-sentences-nltk.html代码示例源： https://www.guru99.com/tokenize-words-sentences-nltk.html

from nltk.tokenize import word_tokenize
text = "God is Great! I won a lottery."
print(word_tokenize(text))

Output: ['God', 'is', 'Great', '!', 'I', 'won', 'a', 'lottery', '.']

如何在 python 中逐字读取？

问题描述

1 个解决方案

解决方案1
0 2021-05-08 13:26:36

I am not sure about the memory issue but I hope this will help我不确定 memory 问题，但我希望这会有所帮助

Replacing words替换单词

Dividing sentence in to words将句子分成单词

如何在 python 中逐字读取？

问题描述

1 个解决方案

解决方案1 0 2021-05-08 13:26:36

I am not sure about the memory issue but I hope this will help我不确定 memory 问题，但我希望这会有所帮助

Replacing words替换单词

Dividing sentence in to words将句子分成单词

解决方案1
0 2021-05-08 13:26:36