[英]How to access files in folder with same string in file name in python?
I am trying to use python to look through a directory folder and match up files with the same strings in the file name.我正在尝试使用 python 来查看目录文件夹并匹配文件名中具有相同字符串的文件。 Each of the files of interest in this folder is a ".csv" file, containing a single values column,
Value_Blue
for the Blue files and Value_Red
for the Red files.此文件夹中的每个感兴趣的文件都是一个“.csv”文件,其中包含一个值列,
Value_Blue
用于蓝色文件, Value_Red
用于红色文件。 The files in this folder go: Blue_111.csv, Blue_124.csv, Blue_145.csv, Blue_165.csv, Blue_176.csv... and then: Red_111.csv, Red_124.csv, Red_145.csv, Red_165.csv, Red_176.csv... and so on. The files in this folder go: Blue_111.csv, Blue_124.csv, Blue_145.csv, Blue_165.csv, Blue_176.csv... and then: Red_111.csv, Red_124.csv, Red_145.csv, Red_165.csv, Red_176. csv...等等。 The numbers associated with each of these files do not, as shown, go in equal interval order, but that is not relevant here.
如图所示,与这些文件中的每一个相关联的数字不是等间隔顺序的 go,但这与此处无关。 For most Blue files, there is a matching Red file with the same numbered extension attached to the file name.
对于大多数蓝色文件,有一个匹配的红色文件,文件名附加了相同的编号扩展名。 And so, there are some Blue files that do not have a corresponding Red file.
因此,有些蓝色文件没有对应的红色文件。
What I am trying to do is loop through all Blue files in the directory folder, open them as dataframes, and then find the matching Red file, open that file as a dataframe, and then multiply the Value
columns together from both of those dataframes, and then send that new dataframe to a new.csv with the file name containing the same extension number.我要做的是遍历目录文件夹中的所有蓝色文件,将它们作为数据帧打开,然后找到匹配的红色文件,将该文件作为 Z6A8064B5DF4794555500553C47C55057DZ 打开,然后将这两个数据帧中的
Value
列相乘,然后将新的 dataframe 发送到新的 csv 文件名包含相同的扩展名。
For example, if in the loop it starts with Blue_111.csv, I then want it to find Red_111.csv.例如,如果在循环中它以 Blue_111.csv 开头,那么我希望它找到 Red_111.csv。 I want both of these.csv files to be opened as dataframes, and the
Value
columns multiplied.我希望将这两个.csv 文件作为数据框打开,并且
Value
列成倍增加。 I then want to send this newly calculated dataframe to a new.csv called `Green_111.csv, and then keep going in the loop onto Blue_124.csv, etc. I then want to send this newly calculated dataframe to a new.csv called `Green_111.csv, and then keep going in the loop onto Blue_124.csv, etc.
Here is pseudocode exemplifying my goal:这是示例我的目标的伪代码:
folder = Path/to/Directory/Folder
for f in folder that is a .csv with "Blue" in filename:
blue_df = pd.read_csv(f)
red = matching Red file
red_df = pd.read_csv(red)
green_df = blue_df.join(red_df)
green_df = green_df['Value_Blue'] * green_df['Value_Red']
green_df.to_csv(Path/to/Directory/Folder/Green_*matching_number*.csv)
How can I match the files and then create the calculated output file with the same matched extension number in the file name?如何匹配文件,然后在文件名中创建具有相同匹配扩展名的计算 output 文件?
Use glob.glob()
to match all filenames matching a wildcard pattern.使用
glob.glob()
匹配所有匹配通配符模式的文件名。 Then you can use .replace()
to replace Blue
with Red
and Green
to create the other filenames.然后您可以使用
.replace()
将Blue
替换为Red
和Green
以创建其他文件名。
import glob, os
folder = 'Path/to/Directory/Folder'
for blue in glob.glob(os.path.join(folder, "Blue_*.csv")):
blue_df = pd.read_csv(blue)
red = blue.replace("Blue_", "Red_")
green = blue.replace("Blue_", "Green_")
red_df = pd.read_csv(red)
green_df = blue_df.join(red_df)
green_df = green_df['Value_Blue'] * green_df['Value_Red']
green_df.to_csv(green)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.