如何将多个文件读入熊猫？

Question

I have a folder that has hundreds or files which contain comma separated data, however, the files themselves have no file extensions (ie, EPI or DXPX; NOT EPI.csv or DXPX.csv). 我有一个包含数百个或包含逗号分隔数据的文件的文件夹，但是，文件本身没有文件扩展名（即EPI或DXPX；不是EPI.csv或DXPX.csv）。

I am trying to create a loop that reads in only certain files that I need (between 15-20 files). 我试图创建一个循环，仅读取所需的某些文件（15-20个文件之间）。 I do not want to concat or append the dfs. 我不想连接或附加DFS。 I merely want to read each df into memory and be able to call the df by name. 我只想将每个df读入内存，并能够按名称调用df。

Even though there is no extension, I can read the file in as .csv 即使没有扩展名，我也可以以.csv格式读取文件

YRD = pd.read_csv('YRD', low_memory=False)

My expected result from the loop below is two dfs: one labeled YRD and another labeled HOUSE. 我从以下循环中获得的预期结果是两个df：一个标记为YRD，另一个标记为HOUSE。 However, I only get one df named df_raw and it is only the final file in the list. 但是，我仅得到一个名为df_raw的df，它只是列表中的最终文件。 Sorry if this is a silly question, but I cannot figure out what I am missing. 抱歉，这是一个愚蠢的问题，但我无法弄清我所缺少的内容。

df_list = ['YRD','HOUSE']

for raw_df in df_list:
    raw_df = pd.read_csv(raw_df, low_memory=False)

Answer 1

This is because you reassign the value raw_df every time you encounter a new file... You should create new variables, not reuse the old ones: 这是因为每次遇到新文件时都要重新分配值raw_df 。您应该创建新变量，而不要重用旧变量：

mydfs=[]
for raw_df in df_list:
    mydfs.append( pd.read_csv(raw_df, low_memory=False))

or you can put them into a dictionnary: 或者您可以将它们放入词典中：

mydfs={}
for raw_df in df_list:
    mydfs[raw_df]= pd.read_csv(raw_df, low_memory=False)

如何将多个文件读入熊猫？

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-03-23 00:23:03

如何将多个文件读入熊猫？

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-03-23 00:23:03

解决方案1
1 已采纳 2018-03-23 00:23:03