简体   繁体   English

更快地对包含 10k+ 文件的文件夹中的 CSV 列表进行排序

[英]Sort list of CSV in a folder that contain 10k+ files faster

Hi I'm a newbie in Python and in coding in general.嗨,我是 Python 的新手,一般来说是编码。 this is my very first post.这是我的第一篇文章。

I am trying to open and concatenate the last 20 files into a dataframe.我正在尝试打开最后 20 个文件并将其连接到一个 dataframe 中。

I am succesuful in doing so when i am working with a test folder that contain only 100 files, but as soon as i try my code in the real folder that contain 10k files my code is very slow and take like 5 minutes to finish.当我使用仅包含 100 个文件的测试文件夹时,我这样做是成功的,但是当我在包含 10k 文件的真实文件夹中尝试我的代码时,我的代码非常慢,需要 5 分钟才能完成。

Here is my try:这是我的尝试:

import pandas as pd
import glob
from datetime import datetime
import numpy as np
import os

path = r'K:/industriel/abc/03_LOG/PRODUCTION/CSV/'

path2 = r'K:/industriel/abc/03_LOG/PRODUCTION/IMG/'

os.chdir(path)
files = glob.glob(path + "/*.csv")
#files = filter(os.path.isfile, os.listdir(path))
files = [os.path.join(path, f) for f in files]
files.sort(key=lambda x: os.path.getctime(x), reverse=False)
dfs = pd.DataFrame()
for i in range(20):
    dfs = dfs.append(pd.read_csv(files[i].split('\\')[-1],delimiter=';', usecols=[0,1,3,4,9,10,20]))

dfs = dfs.reset_index(drop=True)

print(dfs.head(10))

Try reading all the individual files to a list and then concat to form your dataframe at the end:尝试将所有单个文件读取到list ,然后连接以在最后形成您的concat

files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith(".csv")]
files.sort(key=lambda x: os.path.getctime(x), reverse=False)
dfs = list()
for i, file in enumerate(files[:20]):
    dfs.append(pd.read_csv(file, delimiter=';', usecols=[0,1,3,4,9,10,20]))
dfs = pd.concat(dfs)

You can use pd.concat() with a list of read files.您可以将 pd.concat() 与读取文件列表一起使用。 You can replace your code after files.sort(...) with the following您可以将 files.sort(...) 之后的代码替换为以下内容

dfs = pd.concat([
    pd.read_csv(files[i].split('\\')[-1], delimiter=';',  usecols=[0,1,3,4,9,10,20])
    for file in files[20:]
])
dfs = dfs.reset_index(drop=True)
print(dfs.head(10))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM