如何将 function 应用于具有重复索引的 dataframe

Question

Hello, I have the following dataframe with repetitive indices:您好，我有以下带有重复索引的 dataframe：

I want to find a way that I can apply a function to do the sum of values in 'vol' column per each group of two indices.我想找到一种方法，可以应用 function 对每组两个索引的“vol”列中的值求和。 example:例子：

1. take the first two indices using code: 1. 使用代码获取前两个索引：

one = df.loc[1:2]
one

it outputs first two indices 1,1 and 2,2:它输出前两个索引 1,1 和 2,2：

then do the sum of numbers in the column using:然后使用以下方法对列中的数字求和：

one.sum()

output: 90

2. Repeat the process again but this time with index 2,2, and 3,3. 2. 再次重复该过程，但这次使用索引 2,2 和 3,3。

two=df.loc[2:3]
b=two.sum()
b

output: 85

3. Repeat the process again but this time with index 3,3 and 4,4 3. 再次重复该过程，但这次使用索引 3,3 和 4,4

three=df.loc[3:4]
c=three.sum()
c

output : 70

I then put the output values in a new dataframe as然后我将 output 值放入新的 dataframe 中

As you can see, the method is very tedious especially with huge dataframes.如您所见，该方法非常繁琐，尤其是对于庞大的数据帧。 Is there a way I can apply a function that runs through the dataframe as described above?有没有一种方法可以如上所述应用贯穿 dataframe 的 function ？

Answer 1

We need to sum the duplicate row first, the rolling sum我们需要先对重复的行sum ，即rolling sum

x.sum(level=0).rolling(2).sum().dropna()
Out[79]: 
    Vol
2  90.0
3  85.0
4  70.0

Answer 2

you can use this -你可以用这个 -

df.reset_index().set_axis(['index','Vol'], axis=1).groupby(['index'])['Vol'].sum().rolling(2).sum().dropna()

index
2    90.0
3    85.0
4    70.0

Answer 3

I don't know of a pure pandas way of doing this, but a little python can do the trick:我不知道这样做的纯 pandas 方式，但一点 python 可以做到这一点：

values = []
for x in range( df.index[0] , df.index[-1] ):
  value = df.loc[ x : x + 1 ].sum()
  values.append(value)

totals = pd.DataFrame( values )
print( totals )

If you are a fan of list comprehensions:如果你喜欢列表推导：

other = pd.DataFrame( [ df.loc[ x: x+ 1 ].sum() for x in range( df.index[0] , df.index[-1] ) ] )
print( other )

Both yield:两者都产生：

如何将 function 应用于具有重复索引的 dataframe

问题描述

3 个解决方案

解决方案1
2 2020-07-30 00:20:40

解决方案2
0 2020-07-30 00:23:59

解决方案3
0 2020-07-30 00:31:15

如何将 function 应用于具有重复索引的 dataframe

问题描述

3 个解决方案

解决方案1 2 2020-07-30 00:20:40

解决方案2 0 2020-07-30 00:23:59

解决方案3 0 2020-07-30 00:31:15

解决方案1
2 2020-07-30 00:20:40

解决方案2
0 2020-07-30 00:23:59

解决方案3
0 2020-07-30 00:31:15