I'm trying to load a corpus from a directory of.txt files then create a document list.
I thought it would be simple enough, but when I run it nothing happens, am I missing something?
import os.path
import re
import glob
def load_data_from_dir(path):
file_list = glob.glob('/transcripts/*.txt')
# create document list:
documents_list = []
for filename in file_list:
with open(filename, 'r', encoding='utf8') as f:
text = f.read()
f.close()
documents_list.append(text)
print("Total Number of Documents:",len(documents_list))
return documents_list
Make sure to indent your code in python properly. Also, check the path of the code. Either give a absolute path or a relative path. Absolute path would work perfectly though.
import os.path
import re
import glob
def load_data_from_dir():
file_list = glob.glob('./transcripts/*.txt')
# create document list:
documents_list = []
for filename in file_list:
with open(filename, 'r', encoding='utf8') as f:
text = f.read()
documents_list.append(text)
print("Total Number of Documents:",len(documents_list))
return documents_list
load_data_from_dir()
There are a few mistakes in your code.
load_data_from_dir
) body is not indented. Indent all the lines (upto your return statement) in the function bodywith
construct should not be closed explicitly. Remove f.close()
If this is a single module and you run only a function. Then nothing will happen. You need to make sure you call that function. So add
if __name__ == '__main__': load_data_from_dir(...)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.