简体   繁体   English

从文件名可变的文件夹中读取多个.csv文件

[英]Read multiple .csv files from folder with variable part fo file name

I have a folder that contains a variable number of files, and each file has a variable string in the name. 我有一个包含可变数量文件的文件夹,并且每个文件的名称中都有一个可变字符串。 For example: 例如:

my_file V1.csv
my_file V2.csv
my_file something_else.csv

I would need to: 我需要:

  1. Load all the files which name start with "my_file" 加载所有名称以“ my_file”开头的文件
  2. Concatenate all of them in a single dataframe 将所有这些连接在一个数据框中

Right now I am doing it with individual pd.read_csv functions for each file, and then merging them with a concatenate. 现在,我对每个文件使用单独的pd.read_csv函数,然后将它们与串联合并。

This is not optimal as every time the files in the source folder change, I need to modify the script. 这不是最佳选择,因为每次源文件夹中的文件更改时,我都需要修改脚本。

Is it possible to automate this process, so that it works even if the source files change? 是否可以自动执行此过程,以便即使源文件发生更改也可以运行?

You can combine glob , pandas.concat and pandas.read_csv fairly easily. 您可以相当容易地组合globpandas.concatpandas.read_csv Assuming the CSV files are in the same folder as your script: 假设CSV文件与脚本位于同一文件夹中:

import glob

import pandas as pd

df = pd.concat([pd.read_csv(f) for f in glob.glob('my_file*.csv')])
for filename in os.listdir(directory):
     if filename.startswith("my_file") and filename.endswith(".csv"): 
         # do some stuff here
         continue
     else:
         continue

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM