简体   繁体   English

匹配文件名后撤消files.split(python 3.x)

[英]Undo files.split after matching Filename (python 3.x)

Filenames: 文件名:

File1: new_data_20100101.csv File2: samples_20100101.csv File1: new_data_20100101.csv File2: samples_20100101.csv

timestamp is always = %Y%m%d in the filename after a _ and before .csv 时间戳始终是_.csv之前的文件名中的%Y%m%d

I want to find the files where there is a data and a samples file and then do something with those files: My Code so far: 我想找到有datasamples文件的文件,然后对这些文件做一些处理:到目前为止,我的代码:

for all_files in os.listdir():
    if all_files.__contains__("data_"):
        dataList.append(all_files.split('_')[2])
    if all_files.__contains__("samples_"):
        samplesList.append(all_files.split('_')[1])

that gives me the filenames cut down to the Timestamp and the extension .csv 那给我的文件名削减到Timestamp和扩展名.csv

Now I would like to try something like this 现在我想尝试这样的事情

for day in dataList:
    if day in sampleList:
         open day as csv.....

I get a list of days where both files have timestamps... how can I undo that files.split now so aI can go on working with the files since now I would get an error telling me that for instance _2010010.csv does not exist because it's new_data_2010010.csv I'm kinda unsure on how to use the os.basename so I would appreciated some advice on the data names. 我得到了两个文件都带有时间戳的日期列表...我现在如何撤消该文件。拆分,这样我就可以继续使用文件,因为现在我会收到一条错误消息,告诉我例如_2010010.csv不存在因为它是new_data_2010010.csv所以我不确定如何使用os.basename所以我希望能对数据名称提供一些建议。 thanks 谢谢

You could instead use the glob module to get your list. 您可以改用glob模块获取列表。 This allows you to filter just your CSV files. 这样,您就可以仅过滤CSV文件。

The following script creates two dictionaries with the key for each dictionary being the date portion of your filename and the value holding the whole filename. 下面的脚本创建两个字典,每个字典的键分别是文件名的日期部分和保存整个文件名的值。 A list comprehension creates a list of tuples holding each matching pair: 列表推导会创建一个包含每个匹配对的元组列表:

import glob
import os

csv_files = glob.glob('*.csv')

data_files = {file.split('_')[2] : file for file in csv_files if 'data_' in file}
sample_files = {file.split('_')[1] : file for file in csv_files if 'samples_' in file}
matching_pairs = [(sample_files[date], file) for date, file in data_files.items() if date in sample_files]

for sample_file, data_file in sorted(matching_pairs):
    print('{} <-> {}'.format(sample_file, data_file))

For your two file example, this would display the following: 对于您的两个文件示例,将显示以下内容:

samples_20100101.csv <-> new_data_20100101.csv

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM