[英]Pandas: import multiple csv files into dataframe using a loop and hierarchical indexing
I would like to read multiple CSV files (with a different number of columns) from a target directory into a single Python Pandas DataFrame to efficiently search and extract data. 我想从目标目录中读取多个CSV文件(具有不同数量的列)到单个Python Pandas DataFrame中,以便有效地搜索和提取数据。
Example file: 示例文件:
Events
1,0.32,0.20,0.67
2,0.94,0.19,0.14,0.21,0.94
3,0.32,0.20,0.64,0.32
4,0.87,0.13,0.61,0.54,0.25,0.43
5,0.62,0.21,0.77,0.44,0.16
Here is what I have so far: 这是我到目前为止:
# get a list of all csv files in target directory
my_dir = "C:\\Data\\"
filelist = []
os.chdir( my_dir )
for files in glob.glob( "*.csv" ) :
filelist.append(files)
# read each csv file into single dataframe and add a filename reference column
# (i.e. file1, file2, file 3) for each file read
df = pd.DataFrame()
columns = range(1,100)
for c, f in enumerate(filelist) :
key = "file%i" % c
frame = pd.read_csv( (my_dir + f), skiprows = 1, index_col=0, names=columns )
frame['key'] = key
df = df.append(frame,ignore_index=True)
(the indexing isn't working properly) (索引工作不正常)
Essentially, the script below is exactly what I want (tried and tested) but needs to be looped through 10 or more csv files: 从本质上讲,下面的脚本正是我想要的(尝试和测试),但需要通过10个或更多csv文件循环:
df1 = pd.DataFrame()
df2 = pd.DataFrame()
columns = range(1,100)
df1 = pd.read_csv("C:\\Data\\Currambene_001y09h00m_events.csv",
skiprows = 1, index_col=0, names=columns)
df2 = pd.read_csv("C:\\Data\\Currambene_001y12h00m_events.csv",
skiprows = 1, index_col=0, names=columns)
keys = [('file1'), ('file2')]
df = pd.concat([df1, df2], keys=keys, names=['fileno'])
I have found many related links, however I am still not able to get this to work: 我找到了许多相关的链接,但是我仍然无法使其工作:
You need to decide in what axis you want to append your files. 您需要决定要在哪个轴上附加文件。 Pandas will always try to do the right thing by: 熊猫总会尝试通过以下方式做正确的事情:
The trick to appending efficiently is to tip the files sideways, so you get the desired behaviour to match what pandas.concat
will be doing. 有效追加的技巧是侧向提示文件,因此您可以获得所需的行为以匹配pandas.concat
将要执行的操作。 This is my recipe: 这是我的食谱:
from pandas import *
files = !ls *.csv # IPython magic
d = concat([read_csv(f, index_col=0, header=None, axis=1) for f in files], keys=files)
Notice that read_csv
is transposed with axis=1
, so it will be concatenated on the column axis, preserving its names. 请注意, read_csv
是使用axis=1
转置的,因此它将在列轴上连接,并保留其名称。 If you need, you can transpose the resulting DataFrame back with dT
. 如果需要,可以使用dT
将生成的DataFrame转换回来。
EDIT: 编辑:
For different number of columns in each source file, you'll need to supply a header. 对于每个源文件中的不同列数,您需要提供标头。 I understand you don't have a header in your source files, so let's create one with a simple function: 我知道你的源文件中没有标题,所以让我们用一个简单的函数创建一个标题:
def reader(f):
d = read_csv(f, index_col=0, header=None, axis=1)
d.columns = range(d.shape[1])
return d
df = concat([reader(f) for f in files], keys=files)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.