[英]Extract file name from read_csv - Python
I have a script that current reads raw data from a .csv file and performs some pandas data analysis against the data.我有一个脚本,当前从 .csv 文件读取原始数据并对数据执行一些 Pandas 数据分析。 Currently the .csv file is hardcoded and is read in like this:
目前 .csv 文件是硬编码的,读取方式如下:
data = pd.read_csv('test.csv',sep="|", names=col)
I want to change 2 things:我想改变两件事:
I want to turn this into a loop so it loops through a directory of .csv files and executes the pandas analysis below each one in the script.我想把它变成一个循环,以便它循环遍历 .csv 文件的目录,并在脚本中的每个文件下面执行熊猫分析。
I want to take each .csv file and strip the '.csv' and store that in a another list variable, let's call it 'new_table_list'.我想获取每个 .csv 文件并删除“.csv”并将其存储在另一个列表变量中,我们称之为“new_table_list”。
I think I need something like below, at least for the 1st point(though I know this isn't completely correct).我想我需要像下面这样的东西,至少在第一点(尽管我知道这并不完全正确)。 I am not sure how to address the 2nd point
我不知道如何解决第二点
Any help is appreciated任何帮助表示赞赏
import os
path = '\test\test\csvfiles'
table_list = []
for filename in os.listdir(path):
if filename.endswith('.csv'):
table_list.append(file)
data = pd.read_csv(table_list,sep="|", names=col)
Many ways to do it有很多方法可以做到
for filename in os.listdir(path):
if filename.endswith('.csv'):
table_list.append(pd.read_csv(filename,sep="|"))
new_table_list.append(filename.split(".")[0])
One more多一个
for filename in os.listdir(path):
if filename.endswith('.csv'):
table_list.append(pd.read_csv(filename,sep="|"))
new_table_list.append(filename[:-4])
and many more还有很多
As @barmar pointed out, better to append path as well to the table_list
to avoid any issues related to path and location of files and script.正如@barmar 指出的那样,最好将路径也附加到
table_list
以避免与文件和脚本的路径和位置相关的任何问题。
You can try something like this:你可以尝试这样的事情:
import glob
data = {}
for filename in glob.glob('/path/to/csvfiles/*.csv'):
data[filename[:-4]] = pd.read_csv(filename, sep="|", names=col)
Then data.keys()
is the list of filenames without the ".csv" part and data.values()
is a list with one pandas dataframe for each file.然后
data.keys()
是没有“.csv”部分的文件名列表, data.values()
是一个列表,每个文件都有一个data.values()
。
I'd start with using pathlib
.我会从使用
pathlib
开始。
from pathlib import Path
And then leverage the stem
attribute and glob
method.然后利用
stem
属性和glob
方法。
Let's make an import function.让我们做一个导入功能。
def read_csv(f):
return pd.read_csv(table_list, sep="|")
The most generic approach would be to store in a dictionary.最通用的方法是存储在字典中。
p = Path('\test\test\csvfiles')
dod = {f.stem: read_csv(f) for f in p.glob('*.csv')}
And you can also use pd.concat
to turn that into a dataframe.您还可以使用
pd.concat
将其转换为数据帧。
df = pd.concat(dod)
to get the list CSV files in the directory use glob
it is easier than os
要获取目录中的列表 CSV 文件,使用
glob
比os
更容易
from glob import glob
# csvs will contain all CSV files names ends with .csv in a list
csvs = glob('you\\dir\\to\\csvs_folder\\*.csv')
# remove the trailing .csv from CSV files names
new_table_list = [csv[:-3] for csv in csvs]
# read csvs as dataframes
dfs = [pd.read_csv(csv, sep="|", names=col) for csv in csvs]
#concatenate all dataframes into a single dataframe
df = pd.concat(dfs, ignore_index=True)
you can try so:你可以试试:
import os
path = 'your path'
all_csv_files = [f for f in os.listdir(path) if f.endswith('.csv')]
for f in all_csv_files:
data = pd.read_csv(os.path.join(path, f), sep="|", names=col)
# list without .csv
files = [f[:-4] for f all_csv_files]
You can (at the moment of opening) add the filename to a Dataframe attribute as follow:您可以(在打开时)将文件名添加到 Dataframe 属性中,如下所示:
ds.attrs['filename']='filename.csv'
You can subsequently query the dataframe for the name您随后可以查询数据框的名称
ds.attrs['filename']
'filename.csv'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.