简体   繁体   English

如何从其名称的字符串中动态 select a csv ?

[英]How do I dynamically select a csv from the string of its name?

I am looking to pull in a csv file that is downloaded to my downloads folder into a pandas dataframe.我希望将下载到我的下载文件夹的 csv 文件拉入 pandas dataframe 中。 Each time it is downloaded it adds a number to the end of the string, as the filename is already in the folder.每次下载它都会在字符串的末尾添加一个数字,因为文件名已经在文件夹中。 For example, 'transactions (44).csv' is in the folder, the next time this file is downloaded it is named 'transactions (45).csv'.例如,“transactions (44).csv”在文件夹中,下次下载此文件时,它会被命名为“transactions (45).csv”。

I've looked into the glob library or using the os library to open the most recent file in my downloads folder.我查看了 glob 库或使用 os 库打开下载文件夹中的最新文件。 I was unable to produce a solution.我无法提出解决方案。 I'm thinking I need some way to connected to the downloads path, find all csv file types, those with the string 'transactions' in it, and grab the one with the max number in the full filename string.我想我需要一些方法来连接到下载路径,找到所有 csv 文件类型,其中包含字符串“事务”的文件类型,然后在完整文件名字符串中获取最大数量的文件类型。

list(csv.reader(open(path + '/transactions (45).csv'))

I'm hoping for something like this path + '/%transactions%' + 'max()' + '.csv' I know the final answer will be completely different, but I hope this makes sense.我希望这样的path + '/%transactions%' + 'max()' + '.csv'我知道最终的答案会完全不同,但我希望这是有道理的。

One option is to use regular expressions to extract the numerically largest file ID and then construct a new file name:一种选择是使用正则表达式来提取数字最大的文件 ID,然后构造一个新的文件名:

import re
import glob 
last_id = max(int(re.findall(r" \(([0-9]+)\).csv", x)[0]) \
              for x in glob.glob("transactions*.csv"))
name = f'transactions ({last_id}).csv'

Alternatively, find the newest file directly by its modification time或者, 直接通过修改时间找到最新的文件

Note that you should not use a CSV reader to read CSV files in Pandas.请注意,您不应使用 CSV 阅读器来阅读 Pandas 中的 CSV 文件。 Use pd.read_csv() instead.请改用pd.read_csv()

Assuming format " transactions (number).csv ", try below:假设格式为“ transactions (number).csv ”,请尝试以下操作:

import os
import numpy as np

files=os.listdir('Downloads/')
tranfiles=[f for f in files if 'transactions' in f]

Now, your target file is as below:现在,您的目标文件如下:

target_file=tranfiles[np.argmax([int(t.split('(')[1].split(')')[0]) for t in tranfiles])]

Read that desired file as below:读取所需的文件,如下所示:

df=pd.read_csv('Downloads/'+target_file)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM