简体   繁体   English

如何从文件名以日期开头的目录中仅加载最新文件?

[英]How to load only the most recent file from a directory where the filenames startswith the date?

I have files in one directory/folder named:我在一个名为的目录/文件夹中有文件:

  1. 2022-07-31_DATA_GVAX_ARPA_COMBINED.csv
  2. 2022-08-31_DATA_GVAX_ARPA_COMBINED.csv
  3. 2022-09-30_DATA_GVAX_ARPA_COMBINED.csv

The folder will be updated with each month's file in the same format as above eg.:该文件夹将使用与上述相同格式的每个月的文件进行更新,例如:

  • 2022-10-31_DATA_GVAX_ARPA_COMBINED.csv
  • 2022-11-30_DATA_GVAX_ARPA_COMBINED.csv

I want to only load the most recent month's.csv into a pandas dataframe, not all the files.我只想加载最近一个月的.csv 到 pandas dataframe,而不是所有文件。 How can I do this (maybe using glob)?我该怎么做(也许使用 glob)?

I have seen this used for prefixes using:我已经看到它用于前缀使用:

dir_files = r'/path/to/folder/*'

dico={}

for file in Path(dir_files).glob('DATA_GVAX_COMBINED_*.csv'):
    dico[file.stem.split('_')[-1]] = file

max_date = max(dico) 

Glob the directory with the pattern for known files of interest.将目录与感兴趣的已知文件的模式相匹配。 Sort (natural) on the basename.按基本名称排序(自然)。

from glob import glob as GLOB
from os.path import join as JOIN, basename as BASENAME

def get_latest(directory):
    if all_files := list(GLOB(JOIN(directory, '*_DATA_GVAX_ARPA_COMBINED.csv'))):
        return sorted(all_files, key=BASENAME)[-1]

print(get_latest('/Users/Cobra'))

You could try something like this:你可以尝试这样的事情:


import pandas as pd
from pathlib import Path


dir_files = r'/path/to/folder/*'

dico = {}

for file in Path(dir_files).glob('*DATA_GVAX_ARPA_COMBINED*.csv'):
    date_value = pd.to_datetime(file.name.split('_')[0], errors="coerce")
    if pd.notna(date_value):
        dico[date_value] = file

max_date = max(dico.keys())
filepath = dico[max_date]
print(f'{max_date} -> {filepath}')
# Prints:
#
# 2022-10-31 00:00:00 -> 2022-10-31_DATA_GVAX_ARPA_COMBINED.csv

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM