簡體   English   中英

如何從文件名以日期開頭的目錄中僅加載最新文件?

[英]How to load only the most recent file from a directory where the filenames startswith the date?

我在一個名為的目錄/文件夾中有文件:

  1. 2022-07-31_DATA_GVAX_ARPA_COMBINED.csv
  2. 2022-08-31_DATA_GVAX_ARPA_COMBINED.csv
  3. 2022-09-30_DATA_GVAX_ARPA_COMBINED.csv

該文件夾將使用與上述相同格式的每個月的文件進行更新,例如:

  • 2022-10-31_DATA_GVAX_ARPA_COMBINED.csv
  • 2022-11-30_DATA_GVAX_ARPA_COMBINED.csv

我只想加載最近一個月的.csv 到 pandas dataframe,而不是所有文件。 我該怎么做(也許使用 glob)?

我已經看到它用於前綴使用:

dir_files = r'/path/to/folder/*'

dico={}

for file in Path(dir_files).glob('DATA_GVAX_COMBINED_*.csv'):
    dico[file.stem.split('_')[-1]] = file

max_date = max(dico) 

將目錄與感興趣的已知文件的模式相匹配。 按基本名稱排序(自然)。

from glob import glob as GLOB
from os.path import join as JOIN, basename as BASENAME

def get_latest(directory):
    if all_files := list(GLOB(JOIN(directory, '*_DATA_GVAX_ARPA_COMBINED.csv'))):
        return sorted(all_files, key=BASENAME)[-1]

print(get_latest('/Users/Cobra'))

你可以嘗試這樣的事情:


import pandas as pd
from pathlib import Path


dir_files = r'/path/to/folder/*'

dico = {}

for file in Path(dir_files).glob('*DATA_GVAX_ARPA_COMBINED*.csv'):
    date_value = pd.to_datetime(file.name.split('_')[0], errors="coerce")
    if pd.notna(date_value):
        dico[date_value] = file

max_date = max(dico.keys())
filepath = dico[max_date]
print(f'{max_date} -> {filepath}')
# Prints:
#
# 2022-10-31 00:00:00 -> 2022-10-31_DATA_GVAX_ARPA_COMBINED.csv

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM