[英]Summing data-frame columns from different data-frames
I have a number of timeseries .csv files that I am reading into a dataframe (df).我有许多时间序列 .csv 文件,我正在将它们读入数据框 (df)。 I would like to create another data-frame that has the sum of all of these dataframes added together.
我想创建另一个数据帧,它将所有这些数据帧的总和加在一起。
Examples of the dataframes are as follows.数据帧的示例如下。 Example df 1:
示例 df 1:
date BBG.XASX.ABP.S_price BBG.XASX.ABP.S_pos BBG.XASX.ABP.S_trade \
0 2017-09-11 2.8303586 0.0 0.0
1 2017-09-12 2.8135189 0.0 0.0
2 2017-09-13 2.7829274 86614.0 86614.0
3 2017-09-14 2.7928042 86614.0 0.0
4 2017-09-15 2.8120383 86614.0 0.0
BBG.XASX.ABP.S_cost BBG.XASX.ABP.S_pnl_pre_cost
0 -0.0 0.0
1 -0.0 0.0
2 -32.540463966186 0.0
3 -0.0 855.4691551999713
4 -0.0 1665.942337400047
example df2:示例 df2:
date BBG.XASX.AHG.S_price BBG.XASX.AHG.S_pos BBG.XASX.AHG.S_trade \
0 2017-09-11 2.6068676 0.0 0.0
1 2017-09-12 2.6044785 76439.0 76439.0
2 2017-09-13 2.6024171000000003 76439.0 0.0
3 2017-09-14 2.6139929 76439.0 0.0
4 2017-09-15 2.6602836 76439.0 0.0
BBG.XASX.AHG.S_cost BBG.XASX.AHG.S_pnl_pre_cost
0 -0.0 0.0
1 -26.876303828302497 0.0
2 -0.0 -157.5713545999606
3 -0.0 884.8425761999679
4 -0.0 3538.414817300014
example df 3:示例 df 3:
date BBG.XASX.AGL.S_price BBG.XASX.AGL.S_pos BBG.XASX.AGL.S_trade \
0 2017-09-18 18.8195983 0.0 0.0
1 2017-09-19 18.5104704 40613.0 40613.0
2 2017-09-20 18.2010515 40613.0 0.0
3 2017-09-21 18.2217768 40613.0 0.0
4 2017-09-22 17.840112 40613.0 0.0
BBG.XASX.AGL.S_cost BBG.XASX.AGL.S_pnl_pre_cost
0 -0.0 0.0
1 -101.488374137952 0.0
2 -0.0 -12566.42978570005
3 -0.0 841.7166089001112
4 -0.0 -15500.552522399928
Adding together the example dataframes the code would return the following output:将示例数据帧加在一起,代码将返回以下输出:
output:输出:
date 1 2 3 4 5 6
11/09/2017 5.4372262 0 0 0 0 0
12/09/2017 5.4179974 76439 76439 2 -26.87630383 0
13/09/2017 5.3853445 163053 86614 4 -32.54046397 -157.5713546
14/09/2017 5.4067971 163053 0 6 0 1740.311731
15/09/2017 5.4723219 163053 0 8 0 5204.357155
18/09/2017 18.8195983 0 0 0 0 0
19/09/2017 18.5104704 40613 40613 -101.4883741 0 0
20/09/2017 18.2010515 40613 0 0 -12566.42979 0
21/09/2017 18.2217768 40613 0 0 841.7166089 0
22/09/2017 17.840112 40613 0 0 -15500.55252 0
All of the data-frames have the same number of columns in the same order.所有数据帧都具有相同顺序的相同数量的列。 Please note in the output the dates in the individual df's can be different and I would like to see the totals for the individual days.
请注意在输出中各个 df 中的日期可能不同,我想查看各个日期的总数。
The code that I am generating all the individual df
data-frames is:我生成所有单独的
df
数据帧的代码是:
for subdirname in glob.iglob('C:/Users/stacey/WorkDocs/tradeopt/'+filename+'//BBG*/tradeopt.is-pnl*.lzma', recursive=True):
df = pd.DataFrame(numpy.zeros((0,27)))
out = []
with lzma.open(subdirname, mode='rt') as file:
print(subdirname)
for line in file:
items = line.split(",")
out.append(items)
if len(out) > 0:
a = pd.DataFrame(out[1:], columns=out[0])
How can I add together the individal df's into a sumdf?如何将单个 df 加到 sumdf 中?
Idea is convert column date
to DatetimeIndex
and split
columns names by .
想法是将列
date
到DatetimeIndex
和split
按列名.
to MultiIndex
:到多
MultiIndex
:
dfs = []
for subdirname in glob.iglob('C:/Users/stacey/WorkDocs/tradeopt/'+filename+'//BBG*/tradeopt.is-pnl*.lzma', recursive=True):
out = []
with lzma.open(subdirname, mode='rt') as file:
print(subdirname)
for line in file:
items = line.strip().split(",")
out.append(items)
if len(out) > 0:
a = pd.DataFrame(out[1:], columns=out[0]).set_index('date')
a.index = pd.to_datetime(a.index)
dfs.append(a)
And then use concat
and sum
by columns names:然后按列名使用
concat
和sum
:
df = pd.concat(dfs, axis=1).sum(level=0, axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.