I am trying to preprocess a large .txt
file, that is around 12GB.
The following code gives an
Invalid Argument
error. I think it happens because the data is too large.
Is there any way to read a document this big?
Do I need this big data to train the words to generate word vectors?
Or is there some other error?
with open('data/text8') as f:
text = f.read()
Depending on what sort of text processing you are intending, maybe reading one line at a time should suffice:
f = open("data/text8", "r")
for line in f:
# process the string 'line' as desired (it's a single line of the document you opened)
f.close()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.