简体   繁体   中英

Reading multiple files in python

I have a dataset of more than 300k files which I need to read and append to dictionary.

corpus_path = "data"
article_paths = [os.path.join(corpus_path,p) for p in os.listdir(corpus_path)]

doc = []
for path in article_paths:
    dp = pd.read_table(path, header=None, encoding='utf-8', quoting=3, error_bad_lines=False)
    doc.append(dp)

Is there a faster way to do this, as the current method takes more than an hour.

You can use multiprocessing module.

from multiprocessing import Pool

def readFile(path):
    return pd.read_table(path, header=None, encoding='utf-8', quoting=3, error_bad_lines=False)


result = list(Pool(processes=nprocs).imap(readFile, article_paths))  #nprocs = Number of processors 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM