简体   繁体   English

从 read_csv 中提取文件名 - Python

[英]Extract file name from read_csv - Python

I have a script that current reads raw data from a .csv file and performs some pandas data analysis against the data.我有一个脚本,当前从 .csv 文件读取原始数据并对数据执行一些 Pandas 数据分析。 Currently the .csv file is hardcoded and is read in like this:目前 .csv 文件是硬编码的,读取方式如下:

data = pd.read_csv('test.csv',sep="|", names=col)

I want to change 2 things:我想改变两件事:

  1. I want to turn this into a loop so it loops through a directory of .csv files and executes the pandas analysis below each one in the script.我想把它变成一个循环,以便它循环遍历 .csv 文件的目录,并在脚本中的每个文件下面执行熊猫分析。

  2. I want to take each .csv file and strip the '.csv' and store that in a another list variable, let's call it 'new_table_list'.我想获取每个 .csv 文件并删除“.csv”并将其存储在另一个列表变量中,我们称之为“new_table_list”。

I think I need something like below, at least for the 1st point(though I know this isn't completely correct).我想我需要像下面这样的东西,至少在第一点(尽管我知道这并不完全正确)。 I am not sure how to address the 2nd point我不知道如何解决第二点

Any help is appreciated任何帮助表示赞赏

import os 

path = '\test\test\csvfiles'
table_list = []

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(file)
data = pd.read_csv(table_list,sep="|", names=col)

Many ways to do it有很多方法可以做到

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(pd.read_csv(filename,sep="|"))
        new_table_list.append(filename.split(".")[0])

One more多一个

for filename in os.listdir(path):
    if filename.endswith('.csv'):
        table_list.append(pd.read_csv(filename,sep="|"))
        new_table_list.append(filename[:-4])

and many more还有很多

As @barmar pointed out, better to append path as well to the table_list to avoid any issues related to path and location of files and script.正如@barmar 指出的那样,最好将路径也附加到table_list以避免与文件和脚本的路径和位置相关的任何问题。

You can try something like this:你可以尝试这样的事情:

import glob

data = {}
for filename in glob.glob('/path/to/csvfiles/*.csv'):
    data[filename[:-4]] = pd.read_csv(filename, sep="|", names=col)

Then data.keys() is the list of filenames without the ".csv" part and data.values() is a list with one pandas dataframe for each file.然后data.keys()是没有“.csv”部分的文件名列表, data.values()是一个列表,每个文件都有一个data.values()

I'd start with using pathlib .我会从使用pathlib开始。

from pathlib import Path

And then leverage the stem attribute and glob method.然后利用stem属性和glob方法。

Let's make an import function.让我们做一个导入功能。

def read_csv(f):
    return pd.read_csv(table_list, sep="|")

The most generic approach would be to store in a dictionary.最通用的方法是存储在字典中。

p = Path('\test\test\csvfiles')
dod = {f.stem: read_csv(f) for f in p.glob('*.csv')}

And you can also use pd.concat to turn that into a dataframe.您还可以使用pd.concat将其转换为数据帧。

df = pd.concat(dod)

to get the list CSV files in the directory use glob it is easier than os要获取目录中的列表 CSV 文件,使用globos更容易

from glob import glob 

# csvs will contain all CSV files names ends with .csv in a list
csvs = glob('you\\dir\\to\\csvs_folder\\*.csv')

# remove the trailing .csv from CSV files names
new_table_list = [csv[:-3] for csv in csvs]

# read csvs as dataframes
dfs = [pd.read_csv(csv, sep="|", names=col) for csv in csvs]

#concatenate all dataframes into a single dataframe
df = pd.concat(dfs, ignore_index=True)

you can try so:你可以试试:

import os
path = 'your path'
all_csv_files = [f for f in os.listdir(path) if f.endswith('.csv')]
for f in all_csv_files:
    data = pd.read_csv(os.path.join(path, f), sep="|", names=col)

# list without .csv
files = [f[:-4] for f all_csv_files]

You can (at the moment of opening) add the filename to a Dataframe attribute as follow:您可以(在打开时)将文件名添加到 Dataframe 属性中,如下所示:

 ds.attrs['filename']='filename.csv'

You can subsequently query the dataframe for the name您随后可以查询数据框的名称

 ds.attrs['filename']
'filename.csv'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM