[英]Pandas- Set element of Pivot table to variable
I have a Pivot table named table1
that looks like: 我有一个名为
table1
的数据透视表,如下所示:
Volume
Site TripDate
003l 1990-06-10 2354.043820
1991-07-26 2745.673779
1993-10-08 22789.790846
1994-11-20 23072.306665
1995-04-24 25203.740194
1996-02-13 16505.985301
1996-04-15 8257.426317
1996-09-12 9148.369265
1997-02-13 10014.168593
1997-04-20 11154.686365
1997-08-23 13064.444117
1997-11-06 13704.596573
1998-04-15 14358.140459
1999-05-04 18100.457859
2000-03-17 22910.600843
2000-06-01 617.621794
2001-10-05 882.738323
016l 1990-06-10 962.070643
1991-07-26 761.409178
1993-10-08 475.038362
1994-11-20 312.339596
1995-04-24 11569.523232
1996-02-13 15272.175019
1996-04-15 13542.057394
1996-09-12 14556.930737
1997-02-13 18905.265710
1997-04-20 19832.509861
I am interested in Calculating percent volume for each site using the earliest volume calculation as the "theoretical value" to normalize the data. 我对使用最早的体积计算作为“理论值”以标准化数据来计算每个站点的体积百分比感兴趣。 For each site, is there a way to define a variable for the earliest volume calculation (ie 1990-06-10) directly from the pivot table?
对于每个站点,是否有一种方法可以直接从数据透视表中为最早的体积计算(即1990-06-10)定义变量?
An example formula for %Volume would be: %Volume的示例公式为:
%Volume=(V_survey-V_1990)/(V_1990)
I have tired to subset based on the level one index using: 我已经厌倦了使用以下基于一级索引的子集:
test = table1[table1[['TripDate']]==1990-06-10]
but, it throws the following error: 但是,它引发以下错误:
KeyError: "['TripDate'] not in index"
If I check the names of the indices using list(table1.index.names)
it returns: 如果我使用
list(table1.index.names)
检查索引的名称,它将返回:
['Site', 'TripDate']
I have found an answer to my problem, although I am sure there is a much more elegant solution. 尽管我确信有更好的解决方案,但我已经找到解决问题的答案。
In developing my solution, I created a pivot table with the earliest date using: 在开发解决方案时,我使用以下日期创建了最早的数据透视表:
query2 = query1[query1.TripDate=='1990-06-10']
where query1
is a subset of my original data file. 其中
query1
是我的原始数据文件的子集。
I then created a piviot table is a way similar to table1
and table2
using: 然后,我使用以下方法创建了一个piviot表,该表类似于
table1
和table2
:
table3 = pd.pivot_table(query2,values=['Volume'], index=['Site','TripDate'], aggfunc=np.sum)
table3 = table3.rename(columns = {'Volume':'Early_Vol'})
I can then merge table1
and table3` using: 然后,我可以使用以下方法合并
table1
和table3`:
merge = pd.merge(table1.reset_index(),table3.reset_index(),on=['Site'],how='left')
And after a little formatting I am left with my desired output: 经过一点格式化后,剩下的是我想要的输出:
Site TripDate Volume Early_Vol
0 003l 1990-06-10 2354.043820 2354.043820
1 003l 1991-07-26 2745.673779 2354.043820
2 003l 1993-10-08 22789.790846 2354.043820
3 003l 1994-11-20 23072.306665 2354.043820
4 003l 1995-04-24 25203.740194 2354.043820
5 003l 1996-02-13 16505.985301 2354.043820
6 003l 1996-04-15 8257.426317 2354.043820
7 003l 1996-09-12 9148.369265 2354.043820
8 003l 1997-02-13 10014.168593 2354.043820
9 003l 1997-04-20 11154.686365 2354.043820
10 003l 1997-08-23 13064.444117 2354.043820
11 003l 1997-11-06 13704.596573 2354.043820
12 003l 1998-04-15 14358.140459 2354.043820
13 003l 1999-05-04 18100.457859 2354.043820
14 003l 2000-03-17 22910.600843 2354.043820
15 003l 2000-06-01 617.621794 2354.043820
16 003l 2001-10-05 882.738323 2354.043820
17 016l 1990-06-10 962.070643 962.070643
18 016l 1991-07-26 761.409178 962.070643
19 016l 1993-10-08 475.038362 962.070643
20 016l 1994-11-20 312.339596 962.070643
21 016l 1995-04-24 11569.523232 962.070643
22 016l 1996-02-13 15272.175019 962.070643
23 016l 1996-04-15 13542.057394 962.070643
24 016l 1996-09-12 14556.930737 962.070643
25 016l 1997-02-13 18905.265710 962.070643
26 016l 1997-04-20 19832.509861 962.070643
27 016l 1997-08-23 20914.494534 962.070643
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.