[英]pandas - merging multiple DataFrames
This is a multi-part question. 这是一个多部分的问题。 I just can't seem to combine everything together.
我似乎无法将所有内容结合在一起。 The goal is to to create one DataFrame (guessing using MultiIndex) that I can access as follows:
目标是创建一个我可以访问的DataFrame(使用MultiIndex进行猜测),如下所示:
ticker = 'GOLD'
date = pd.to_datetime('1978/03/31')
current_bar = df.ix[ticker].ix[date]
Can I then just say: current_bar.Last ? 然后我可以说:current_bar.Last吗?
Anyway, here are the files, and how I load them. 无论如何,这是文件,以及我如何加载它们。
In [108]: df = pd.read_csv('GOLD.csv', parse_dates='Date', index_col='Date')
In [109]: df
Out[109]:
Exp Last Volume
Date
1978-03-30 198002 995.6 54
1978-03-31 198002 999.5 78
In [110]: df2 = pd.read_csv('SPX.csv', parse_dates='Date', index_col='Date')
In [111]: df2
Out[111]:
Exp Last Volume
Date
1978-03-30 198003 215.5 25
1978-03-31 198003 214.1 99
Ideally, I want it to look like this (I think): 理想情况下,我希望它看起来像这样(我认为):
ticker GOLD SPX
values Exp Last Volume Exp Last Volume
Date
1978-03-30 198002 995.6 54 198003 215.5 25
1978-03-31 198002 999.5 78 198003 214.1 99
Thanks so much. 非常感谢。
You can use pd.concat
to concatenate DataFrames. 您可以使用
pd.concat
串联 DataFrame。 ( Concatenating smushes DataFrames together, while merging joins DataFrames based on common indices or columns). ( 串联 smushes DataFrames在一起,而合并加入基于共同的指数或列DataFrames)。 When you supply the
keys
parameter, you get a hierarchical index: 提供
keys
参数时,您将获得一个层次结构索引:
import pandas as pd
df = pd.read_csv('GOLD.csv', parse_dates='Date', index_col='Date', sep='\s+')
df2 = pd.read_csv('SPX.csv', parse_dates='Date', index_col='Date', sep='\s+')
result = pd.concat([df, df2], keys=['GOLD', 'SPX'], names=['ticker']).unstack('ticker')
result = result.reorder_levels([1, 0], axis=1).sortlevel(level=0, axis=1)
print(result)
yields 产量
ticker GOLD SPX
Exp Last Volume Exp Last Volume
Date
1978-03-30 198002 995.6 54 198003 215.5 25
1978-03-31 198002 999.5 78 198003 214.1 99
result['Last']
yields the DataFrame: result['Last']
产生DataFrame:
In [147]: result['Last']
Out[147]:
ticker GOLD SPX
Date
1978-03-30 995.6 215.5
1978-03-31 999.5 214.1
I'd recommend avoiding the syntax result.Last
because it is too close to result.last
, which returns a DataFrame method. 我建议避免语法
result.Last
,因为它太靠近result.last
,它返回一个数据帧的方法。
To handle more files, you might use code like this: 要处理更多文件,您可以使用如下代码:
import pandas as pd
dfs = list()
for filename in filenames:
df = pd.read_csv(filename, parse_dates='Date', index_col='Date')
# compute moving_mean
dfs.append(df)
keys = [filename[:-4] for filename in filenames]
result = pd.concat(dfs, keys=keys, names=['ticker']).unstack('ticker')
Note that this does require enough memory to hold a list of all the DataFrames in memory plus enough memory to hold result
. 请注意,这确实需要足够的内存来保存内存中所有DataFrame的列表,再加上足够的内存来保存
result
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.