简体   繁体   English

Python pandas groupby与按列过滤的其他行的区别

[英]Python pandas groupby difference with other row filtered by column

I am struggling with Python Pandas with groupby. 我正在使用groupby与Python Pandas挣扎。 How should I accomplish the following? 我该如何完成以下操作? For every fruit I would like to find the difference with the 'step 0' value of that fruit. 对于每种水果,我想找到与该水果的“第0步”值的差异。

df = pd.DataFrame({'Fruit' : ['Apple', 'Apple', 'Apple', 'Banana', 'Banana', 'Banana'], 'Step' : [0, 1, 2, 0, 1, 2], 'Value' : [100, 102, 105, 200, 210, 195] })

    Fruit  Step  Value     to-be
0   Apple     0    100  -->  0
1   Apple     1    102  -->  2
2   Apple     2    105  -->  5
3  Banana     0    200  -->  0
4  Banana     1    210  --> 10
5  Banana     2    195  --> -5

Thank you! 谢谢!

This should do it: 这应该这样做:

df.groupby('Fruit').apply(lambda g: g.Value - g[g.Step == 0].Value.values[0])

First, we're grouping by the column you care about (Fruit). 首先,我们按照您关注的列(Fruit)进行分组。 Then we're applying a function to each group (using a lambda which lets us specify a function in-line). 然后我们将一个函数应用于每个组(使用lambda ,它允许我们在线指定一个函数)。 For each group, we find the row(s) where g.Step == 0 , then get the Value entry from that row, and use values[0] to get the first Value (in case there were multiple places where g.Step == 0 ). 对于每个组,我们找到g.Step == 0 ,然后从该行获取Value条目,并使用values[0]获取第一个Value(如果有多个地方g.Step == 0 )。 Then we just subtract that one value from all the rows in the group, and return it. 然后我们只从组中的所有行中减去该值,然后返回它。

If you want to add it as a column to the dataframe, you can drop the index: 如果要将其作为列添加到数据框中,可以删除索引:

res = df.groupby('Fruit').apply(lambda g: g.Value - g[g.Step == 0].Value.values[0])
df['Result'] = res.reset_index(drop=True)

Think this does the trick. 认为这样做的伎俩。 It simply loops through the rows and applies a new 'first' value each time the step is equal to 0. Then calculates the difference from that first value. 它只是循环遍历行,并在每次步长等于0时应用新的“第一个”值。然后计算与第一个值的差值。

rows = range(df.shape[0])
df['count'] = 0
for r in rows:
    step = df.iloc[r,1]
    value = df.iloc[r,2]
    if step == 0:
        first = value
    df.iloc[r,3] = value - first

I am a newbie to pandas, but at least the following code works. 我是熊猫的新手,但至少以下代码有效。 The end of result, 结果结束,

    Fruit  Step  Value  to-be
0   Apple     0    100      0
1   Apple     1    102      2
2   Apple     2    105      5
3  Banana     0    200      0
4  Banana     1    210     10
5  Banana     2    195     -5

[6 rows x 4 columns]

The source code is as follows. 源代码如下。

import pandas as pd

df = pd.DataFrame({'Fruit' : ['Apple', 'Apple', 'Apple', 'Banana', 'Banana', 'Banana'], 
                    'Step' : [0, 1, 2, 0, 1, 2], 
                    'Value' : [100, 102, 105, 200, 210, 195] })

list_groups = list()

# loop over dataframe groupby `Fruit`
for name, group in df.groupby('Fruit'):
    group.sort('Step', ascending=True) # sorted by `Step`

    row_iterator = group.iterrows()

    # get the base value
    idx, first_row = row_iterator.next()
    base_value = first_row['Value']

    to_be = [0] # store the values of the column `to-be`
    for idx, row in row_iterator:
        to_be.append(row['Value'] - base_value)

    # add a column to group
    group['to-be'] = pd.Series(to_be, index=group.index)

    list_groups.append(group)


# Concatenate dataframes
result = pd.concat(list_groups)

print(result)

@ASGM, I run your code, @ASGM,我运行你的代码,

res = df.groupby('Fruit').apply(lambda g: g.Value - g[g.Step == 0].Value.values[0])
df['Result'] = res.reset_index(drop=True)

but encounter the issue, 但遇到这个问题,

Traceback (most recent call last):
  File "***.py", line 9, in <module>
    df['Result'] = res.reset_index(drop=True)
  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1887, in __setitem__
    self._set_item(key, value)
  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1968, in _set_item
    NDFrame._set_item(self, key, value)
  File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1068, in _set_item
    self._data.set(key, value)
  File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3024, in set
    self.insert(len(self.items), item, value)
  File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3039, in insert
    self._add_new_block(item, value, loc=loc)
  File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3162, in _add_new_block
    self.items, fastpath=True)
  File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 1993, in make_block
    placement=placement)
  File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 64, in __init__
    '%d' % (len(items), len(values)))
ValueError: Wrong number of items passed 1, indices imply 3
[Finished in 0.4s with exit code 1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM