Pandas-将数据透视表的元素设置为变量

Question

I have a Pivot table named table1 that looks like: 我有一个名为table1的数据透视表，如下所示：

                       Volume
Site TripDate                
003l 1990-06-10   2354.043820
     1991-07-26   2745.673779
     1993-10-08  22789.790846
     1994-11-20  23072.306665
     1995-04-24  25203.740194
     1996-02-13  16505.985301
     1996-04-15   8257.426317
     1996-09-12   9148.369265
     1997-02-13  10014.168593
     1997-04-20  11154.686365
     1997-08-23  13064.444117
     1997-11-06  13704.596573
     1998-04-15  14358.140459
     1999-05-04  18100.457859
     2000-03-17  22910.600843
     2000-06-01    617.621794
     2001-10-05    882.738323
016l 1990-06-10    962.070643
     1991-07-26    761.409178
     1993-10-08    475.038362
     1994-11-20    312.339596
     1995-04-24  11569.523232
     1996-02-13  15272.175019
     1996-04-15  13542.057394
     1996-09-12  14556.930737
     1997-02-13  18905.265710
     1997-04-20  19832.509861

I am interested in Calculating percent volume for each site using the earliest volume calculation as the "theoretical value" to normalize the data. 我对使用最早的体积计算作为“理论值”以标准化数据来计算每个站点的体积百分比感兴趣。 For each site, is there a way to define a variable for the earliest volume calculation (ie 1990-06-10) directly from the pivot table? 对于每个站点，是否有一种方法可以直接从数据透视表中为最早的体积计算（即1990-06-10）定义变量？

An example formula for %Volume would be: ％Volume的示例公式为：

%Volume=(V_survey-V_1990)/(V_1990)

I have tired to subset based on the level one index using: 我已经厌倦了使用以下基于一级索引的子集：

test = table1[table1[['TripDate']]==1990-06-10]

but, it throws the following error: 但是，它引发以下错误：

KeyError: "['TripDate'] not in index"

If I check the names of the indices using list(table1.index.names) it returns: 如果我使用list(table1.index.names)检查索引的名称，它将返回：

['Site', 'TripDate']

Answer 1

I have found an answer to my problem, although I am sure there is a much more elegant solution. 尽管我确信有更好的解决方案，但我已经找到解决问题的答案。

In developing my solution, I created a pivot table with the earliest date using: 在开发解决方案时，我使用以下日期创建了最早的数据透视表：

query2 = query1[query1.TripDate=='1990-06-10']

where query1 is a subset of my original data file. 其中query1是我的原始数据文件的子集。

I then created a piviot table is a way similar to table1 and table2 using: 然后，我使用以下方法创建了一个piviot表，该表类似于table1和table2 ：

table3 = pd.pivot_table(query2,values=['Volume'], index=['Site','TripDate'], aggfunc=np.sum)
table3 = table3.rename(columns = {'Volume':'Early_Vol'})

I can then merge table1 and table3` using: 然后，我可以使用以下方法合并table1和table3`：

merge = pd.merge(table1.reset_index(),table3.reset_index(),on=['Site'],how='left')

And after a little formatting I am left with my desired output: 经过一点格式化后，剩下的是我想要的输出：

    Site   TripDate        Volume    Early_Vol
0   003l 1990-06-10   2354.043820  2354.043820
1   003l 1991-07-26   2745.673779  2354.043820
2   003l 1993-10-08  22789.790846  2354.043820
3   003l 1994-11-20  23072.306665  2354.043820
4   003l 1995-04-24  25203.740194  2354.043820
5   003l 1996-02-13  16505.985301  2354.043820
6   003l 1996-04-15   8257.426317  2354.043820
7   003l 1996-09-12   9148.369265  2354.043820
8   003l 1997-02-13  10014.168593  2354.043820
9   003l 1997-04-20  11154.686365  2354.043820
10  003l 1997-08-23  13064.444117  2354.043820
11  003l 1997-11-06  13704.596573  2354.043820
12  003l 1998-04-15  14358.140459  2354.043820
13  003l 1999-05-04  18100.457859  2354.043820
14  003l 2000-03-17  22910.600843  2354.043820
15  003l 2000-06-01    617.621794  2354.043820
16  003l 2001-10-05    882.738323  2354.043820
17  016l 1990-06-10    962.070643   962.070643
18  016l 1991-07-26    761.409178   962.070643
19  016l 1993-10-08    475.038362   962.070643
20  016l 1994-11-20    312.339596   962.070643
21  016l 1995-04-24  11569.523232   962.070643
22  016l 1996-02-13  15272.175019   962.070643
23  016l 1996-04-15  13542.057394   962.070643
24  016l 1996-09-12  14556.930737   962.070643
25  016l 1997-02-13  18905.265710   962.070643
26  016l 1997-04-20  19832.509861   962.070643
27  016l 1997-08-23  20914.494534   962.070643

Pandas-将数据透视表的元素设置为变量

问题描述

1 个解决方案

解决方案1
0 2016-04-13 22:32:05

Pandas-将数据透视表的元素设置为变量

问题描述

1 个解决方案

解决方案1 0 2016-04-13 22:32:05

解决方案1
0 2016-04-13 22:32:05