![](/img/trans.png)
[英]How to read and manipulate multiple CSV files using pandas and for-loop?
[英]How to read faster multiple CSV files using Python pandas
我的程序应该读取 ~400.000 csv 文件,这需要很长时间。 我使用的代码是:
for file in self.files:
size=2048
csvData = pd.read_csv(file, sep='\t', names=['acol', 'bcol'], header=None, skiprows=range(0,int(size/2)), skipfooter=(int(size/2)-10))
for index in range(0,10):
s=s+float(csvData['bcol'][index])
s=s/10
averages.append(s)
time=file.rpartition('\\')[2]
time=int(re.search(r'\d+', time).group())
times.append(time)
有没有提高速度的机会?
您可以使用线程。 我从这里获取了以下代码并针对您的用例进行了修改
global times =[]
def my_func(file):
size=2048
csvData = pd.read_csv(file, sep='\t', names=['acol', 'bcol'], header=None, skiprows=range(0,int(size/2)), skipfooter=(int(size/2)-10))
for index in range(0,10):
s=s+float(csvData['bcol'][index])
s=s/10
averages.append(s)
time=file.rpartition('\\')[2]
time=int(re.search(r'\d+', time).group())
times.append(time)
threads = []
# In this case 'self.files' is a list of files to be read.
for ii in range(self.files):
# We start one thread per file present.
process = Thread(target=my_func, args=[ii])
process.start()
threads.append(process)
# We now pause execution on the main thread by 'joining' all of our started threads.
# This ensures that each has finished processing the urls.
for process in threads:
process.join()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.