简体繁体中英

Reading multiple files in python

原文 2018-02-24 15:30:38 1 1 python/ filereader

I have a dataset of more than 300k files which I need to read and append to dictionary.

corpus_path = "data"
article_paths = [os.path.join(corpus_path,p) for p in os.listdir(corpus_path)]

doc = []
for path in article_paths:
    dp = pd.read_table(path, header=None, encoding='utf-8', quoting=3, error_bad_lines=False)
    doc.append(dp)

Is there a faster way to do this, as the current method takes more than an hour.

1 answers

You can use multiprocessing module.

from multiprocessing import Pool

def readFile(path):
    return pd.read_table(path, header=None, encoding='utf-8', quoting=3, error_bad_lines=False)


result = list(Pool(processes=nprocs).imap(readFile, article_paths))  #nprocs = Number of processors

Python: reading and writing multiple files

Reading multiple depended files in python

Reading multiple files at once in python

Reading .env in multiple files [Python]

python programming-reading from multiple files

Reading multiple CSV files and merge Python Pandas

Reading multiple CSV files with headers in python with numpy?

Python and Dask - reading and concatenating multiple files

How to Return in python, reading multiple .xml files

Reading and manipulating multiple netcdf files in python

暂无

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Python: reading and writing multiple files Reading multiple depended files in python Reading multiple files at once in python Reading .env in multiple files [Python] python programming-reading from multiple files Reading multiple CSV files and merge Python Pandas Reading multiple CSV files with headers in python with numpy? Python and Dask - reading and concatenating multiple files How to Return in python, reading multiple .xml files Reading and manipulating multiple netcdf files in python

Related Tags

粤ICP备18138465号 © 2020-2024 STACKOOM.COM