简体   繁体   English

如何将 function 应用于具有重复索引的 dataframe

[英]How can I apply a function to a dataframe with repetitive index

Hello, I have the following dataframe with repetitive indices:您好,我有以下带有重复索引的 dataframe:

    Vol
1   25
1   15
2   20
2   30
3   25
3   10
4   15
4   20

I want to find a way that I can apply a function to do the sum of values in 'vol' column per each group of two indices.我想找到一种方法,可以应用 function 对每组两个索引的“vol”列中的值求和。 example:例子:

1. take the first two indices using code: 1. 使用代码获取前两个索引:

one = df.loc[1:2]
one

it outputs first two indices 1,1 and 2,2:它输出前两个索引 1,1 和 2,2:

    Vol
1   25
1   15
2   20
2   30

then do the sum of numbers in the column using:然后使用以下方法对列中的数字求和:

one.sum()

output: 90

2. Repeat the process again but this time with index 2,2, and 3,3. 2. 再次重复该过程,但这次使用索引 2,2 和 3,3。

two=df.loc[2:3]
b=two.sum()
b

output: 85

3. Repeat the process again but this time with index 3,3 and 4,4 3. 再次重复该过程,但这次使用索引 3,3 和 4,4

three=df.loc[3:4]
c=three.sum()
c

output : 70

I then put the output values in a new dataframe as然后我将 output 值放入新的 dataframe 中

      vol2
0      90
1      85
2      70

As you can see, the method is very tedious especially with huge dataframes.如您所见,该方法非常繁琐,尤其是对于庞大的数据帧。 Is there a way I can apply a function that runs through the dataframe as described above?有没有一种方法可以如上所述应用贯穿 dataframe 的 function ?

We need to sum the duplicate row first, the rolling sum我们需要先对重复的行sum ,即rolling sum

x.sum(level=0).rolling(2).sum().dropna()
Out[79]: 
    Vol
2  90.0
3  85.0
4  70.0

you can use this -你可以用这个 -

df.reset_index().set_axis(['index','Vol'], axis=1).groupby(['index'])['Vol'].sum().rolling(2).sum().dropna()
index
2    90.0
3    85.0
4    70.0

I don't know of a pure pandas way of doing this, but a little python can do the trick:我不知道这样做的纯 pandas 方式,但一点 python 可以做到这一点:

values = []
for x in range( df.index[0] , df.index[-1] ):
  value = df.loc[ x : x + 1 ].sum()
  values.append(value)

totals = pd.DataFrame( values )
print( totals )

If you are a fan of list comprehensions:如果你喜欢列表推导:

other = pd.DataFrame( [ df.loc[ x: x+ 1 ].sum() for x in range( df.index[0] , df.index[-1] ) ] )
print( other ) 

Both yield:两者都产生:

   vol
0   90 
1   85
2   70

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM