简体   繁体   English

Pandas Dataframe Groupby应用具有多个列返回值的Lambda函数

[英]Pandas Dataframe Groupby Apply Lambda Function With Multiple Column Returns

I couldn't find anything on SO on this. 我对此一无所获。 What I'm trying to do is generate 4 new columns on my existing dataframe, by applying a separate function with 4 specific columns as inputs and return 4 output columns that are not the 4 initial columns. 我想做的是在现有数据帧上生成4个新列,方法是应用一个单独的函数,其中4个特定列作为输入,并返回4个初始列而不是4个输出列。 However, the function requires me to slice the dataframe by conditions before usage. 但是,该功能要求我在使用前按条件对数据帧进行切片。 I have been using for loops and appending, but it is extremely slow. 我一直在使用for循环和追加,但是它非常慢。 I was hoping that there was a way to do a MapReduce-esque operation, where it would take my DataFrame, do a groupby and apply a function I separately wrote. 我希望有一种方法可以执行MapReduce式的操作,该操作将使用我的DataFrame,进行groupby并应用我单独编写的函数。

The function has multiple outputs, so just imagine a function like this: 该函数具有多个输出,因此,请想象一个这样的函数:

    def func(a,b,c,d):
        return f(a),g(b),h(c),i(d)

where f,g,h,i are different functions performed on the inputs. 其中f,g,h,i是对输入执行的不同功能。 I am trying to do something like: 我正在尝试做类似的事情:

    import pandas as pd

    df = pd.DataFrame({'a': range(10),
                       'b': range(10),
                       'c': range(10),
                       'd':range(10},
                       'e': [0,0,0,0,0,1,1,1,1,1])

    df.groupby('e').apply(lambda df['x1'],df['x2'],df['x3'],df['x4'] =
                          func(df['a'],df['b'],df['c'],df['d']))

Wondering if this is possible. 想知道这是否可能。 If there are other functions out there in the library/ more efficient ways to go about this, please do advise. 如果库中还有其他功能/更有效的解决方法,请务必提供建议。 Thanks. 谢谢。

EDIT: Here's a sample output 编辑:这是一个示例输出

   a  b  c  d  e  f  g  h  i 
   --------------------------
   0  0  0  0  0  f1 g1 h1 i1
   1  1  1  1  1  f2 g2 h2 i2
    ... and so on 

The reason why I'd like this orientation of operations is due to the function's operations being reliant on structures within the data (hence the groupby) before performing the function. 我之所以希望这种操作方向是因为函数的操作依赖于执行该函数之前数据中的结构(因此称为groupby)。 Previously, I obtained the unique values and iterated over them while slicing the dataframe up, before appending it to a new dataframe. 以前,我在分割数据框时获得唯一值并对其进行迭代,然后再将其附加到新数据框。 Runs in quadratic time. 以二次时间运行。

You could do something like this: 您可以执行以下操作:

def f(data):
    data['a2']=data['a']*2 #or whatever function/calculation you want
    data['b2']=data['b']*3 #etc etc
    #e.g. data['g']=g(data['b'])
    return data

df.groupby('e').apply(f)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM