[英]How to load only the most recent file from a directory where the filenames endswith the date?
[英]How to load only the most recent file from a directory where the filenames startswith the date?
我在一個名為的目錄/文件夾中有文件:
2022-07-31_DATA_GVAX_ARPA_COMBINED.csv
2022-08-31_DATA_GVAX_ARPA_COMBINED.csv
2022-09-30_DATA_GVAX_ARPA_COMBINED.csv
該文件夾將使用與上述相同格式的每個月的文件進行更新,例如:
2022-10-31_DATA_GVAX_ARPA_COMBINED.csv
2022-11-30_DATA_GVAX_ARPA_COMBINED.csv
我只想加載最近一個月的.csv 到 pandas dataframe,而不是所有文件。 我該怎么做(也許使用 glob)?
我已經看到它用於前綴使用:
dir_files = r'/path/to/folder/*'
dico={}
for file in Path(dir_files).glob('DATA_GVAX_COMBINED_*.csv'):
dico[file.stem.split('_')[-1]] = file
max_date = max(dico)
將目錄與感興趣的已知文件的模式相匹配。 按基本名稱排序(自然)。
from glob import glob as GLOB
from os.path import join as JOIN, basename as BASENAME
def get_latest(directory):
if all_files := list(GLOB(JOIN(directory, '*_DATA_GVAX_ARPA_COMBINED.csv'))):
return sorted(all_files, key=BASENAME)[-1]
print(get_latest('/Users/Cobra'))
你可以嘗試這樣的事情:
import pandas as pd
from pathlib import Path
dir_files = r'/path/to/folder/*'
dico = {}
for file in Path(dir_files).glob('*DATA_GVAX_ARPA_COMBINED*.csv'):
date_value = pd.to_datetime(file.name.split('_')[0], errors="coerce")
if pd.notna(date_value):
dico[date_value] = file
max_date = max(dico.keys())
filepath = dico[max_date]
print(f'{max_date} -> {filepath}')
# Prints:
#
# 2022-10-31 00:00:00 -> 2022-10-31_DATA_GVAX_ARPA_COMBINED.csv
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.