[英]How to read specific files of known filenames with pandas after using os.walk?
I am using os.walk() to get all the files in a certain directory;我正在使用 os.walk() 来获取某个目录中的所有文件; however, there are a ton of files in that directory that I don't need.
但是,该目录中有大量我不需要的文件。 I know the specific names of the files that I want to read as they update everyday and only the date of the filename changes.
我知道我想要阅读的文件的具体名称,因为它们每天都会更新,并且只有文件名更改的日期。
import pandas as pd
import os
from os import listdir, walk
from os.path import isfile, join
td_str = pd.to_datetime('today').strftime('%Y%m%d') # Returns '20200923'
path = 'C:\\Users\\myuser\\subdirectory\\' + td_str
for root, directories, files in os.walk(path, topdown=False):
for name in files:
print(os.path.join(root, name))
### The Output
# C:\Users\myuser\subdirectory\20200923\20200923_file_a.csv
# C:\Users\myuser\subdirectory\20200923\20200923_file_b.csv
# C:\Users\myuser\subdirectory\20200923\20200923_file_c.csv
# C:\Users\myuser\subdirectory\20200923\20200923_file_d.csv
I know I want to read file_b and file_c and put them into respective dataframes.我知道我想读取 file_b 和 file_c 并将它们放入各自的数据帧中。
df_file_b = pd.read_csv('C:\Users\myuser\subdirectory\20200923\20200923_file_b.csv')
df_file_c = pd.read_csv('C:\Users\myuser\subdirectory\20200923\20200923_file_c.csv')
How can I read those specific files only with pandas?如何仅使用熊猫读取这些特定文件? I currently assume I might have to do something like have the desired filenames in a list and do an
if
loop to check if os.walk() finds it, but is there a more efficient way to do this?我目前假设我可能需要做一些事情,比如在列表中包含所需的文件名并执行
if
循环来检查 os.walk() 是否找到它,但是有没有更有效的方法来做到这一点?
Thanks much.非常感谢。
You can make use of the glob module, supporting patterns: 10.7.您可以使用 glob 模块,支持模式: 10.7。 glob — Unix style pathname pattern expansion .
glob — Unix 风格的路径名模式扩展。
A simple solution would be to create a filter with file_
and the character range b
to c
:一个简单的解决方案是使用
file_
和字符范围b
到c
创建一个过滤器:
import glob
print(glob.glob('*file_[b-c]*'))
This will print这将打印
['20200923_file_b.csv', '20200923_file_c.csv']
['20200923_file_b.csv', '20200923_file_c.csv']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.