[英]Undo files.split after matching Filename (python 3.x)
Filenames: 文件名:
File1: new_data_20100101.csv
File2: samples_20100101.csv
File1:
new_data_20100101.csv
File2: samples_20100101.csv
timestamp is always = %Y%m%d
in the filename after a _
and before .csv
时间戳始终是
_
和.csv
之前的文件名中的%Y%m%d
I want to find the files where there is a data
and a samples
file and then do something with those files: My Code so far: 我想找到有
data
和samples
文件的文件,然后对这些文件做一些处理:到目前为止,我的代码:
for all_files in os.listdir():
if all_files.__contains__("data_"):
dataList.append(all_files.split('_')[2])
if all_files.__contains__("samples_"):
samplesList.append(all_files.split('_')[1])
that gives me the filenames cut down to the Timestamp
and the extension .csv
那给我的文件名削减到
Timestamp
和扩展名.csv
Now I would like to try something like this 现在我想尝试这样的事情
for day in dataList:
if day in sampleList:
open day as csv.....
I get a list of days where both files have timestamps... how can I undo that files.split now so aI can go on working with the files since now I would get an error telling me that for instance _2010010.csv
does not exist because it's new_data_2010010.csv
I'm kinda unsure on how to use the os.basename
so I would appreciated some advice on the data names. 我得到了两个文件都带有时间戳的日期列表...我现在如何撤消该文件。拆分,这样我就可以继续使用文件,因为现在我会收到一条错误消息,告诉我例如
_2010010.csv
不存在因为它是new_data_2010010.csv
所以我不确定如何使用os.basename
所以我希望能对数据名称提供一些建议。 thanks 谢谢
You could instead use the glob
module to get your list. 您可以改用
glob
模块获取列表。 This allows you to filter just your CSV
files. 这样,您就可以仅过滤
CSV
文件。
The following script creates two dictionaries with the key for each dictionary being the date portion of your filename and the value holding the whole filename. 下面的脚本创建两个字典,每个字典的键分别是文件名的日期部分和保存整个文件名的值。 A list comprehension creates a list of tuples holding each matching pair:
列表推导会创建一个包含每个匹配对的元组列表:
import glob
import os
csv_files = glob.glob('*.csv')
data_files = {file.split('_')[2] : file for file in csv_files if 'data_' in file}
sample_files = {file.split('_')[1] : file for file in csv_files if 'samples_' in file}
matching_pairs = [(sample_files[date], file) for date, file in data_files.items() if date in sample_files]
for sample_file, data_file in sorted(matching_pairs):
print('{} <-> {}'.format(sample_file, data_file))
For your two file example, this would display the following: 对于您的两个文件示例,将显示以下内容:
samples_20100101.csv <-> new_data_20100101.csv
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.