简体   繁体   English

如何将多个文件读入熊猫?

[英]How to read in multiple files into pandas?

I have a folder that has hundreds or files which contain comma separated data, however, the files themselves have no file extensions (ie, EPI or DXPX; NOT EPI.csv or DXPX.csv). 我有一个包含数百个或包含逗号分隔数据的文件的文件夹,但是,文件本身没有文件扩展名(即EPI或DXPX;不是EPI.csv或DXPX.csv)。

I am trying to create a loop that reads in only certain files that I need (between 15-20 files). 我试图创建一个循环,仅读取所需的某些文件(15-20个文件之间)。 I do not want to concat or append the dfs. 我不想连接或附加DFS。 I merely want to read each df into memory and be able to call the df by name. 我只想将每个df读入内存,并能够按名称调用df。

Even though there is no extension, I can read the file in as .csv 即使没有扩展名,我也可以以.csv格式读取文件

YRD = pd.read_csv('YRD', low_memory=False)

My expected result from the loop below is two dfs: one labeled YRD and another labeled HOUSE. 我从以下循环中获得的预期结果是两个df:一个标记为YRD,另一个标记为HOUSE。 However, I only get one df named df_raw and it is only the final file in the list. 但是,我仅得到一个名为df_raw的df,它只是列表中的最终文件。 Sorry if this is a silly question, but I cannot figure out what I am missing. 抱歉,这是一个愚蠢的问题,但我无法弄清我所缺少的内容。

df_list = ['YRD','HOUSE']

for raw_df in df_list:
    raw_df = pd.read_csv(raw_df, low_memory=False)

This is because you reassign the value raw_df every time you encounter a new file... You should create new variables, not reuse the old ones: 这是因为每次遇到新文件时都要重新分配值raw_df 。您应该创建新变量,而不要重用旧变量:

mydfs=[]
for raw_df in df_list:
    mydfs.append( pd.read_csv(raw_df, low_memory=False))

or you can put them into a dictionnary: 或者您可以将它们放入词典中:

mydfs={}
for raw_df in df_list:
    mydfs[raw_df]= pd.read_csv(raw_df, low_memory=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM