简体   繁体   English

Pandas 申请创建多列,使用多列作为输入

[英]Pandas apply to create multiple columns, using multiple columns as input

I am trying to use a function to create multiple outputs, using multiple columns as inputs.我正在尝试使用 function 创建多个输出,使用多列作为输入。 Here's my attempt:这是我的尝试:

df = pd.DataFrame(np.random.randint(0,10,size=(6, 4)), columns=list('ABCD'))
df.head()

    A   B   C   D
0   8   2   5   0
1   9   9   8   6
2   4   0   1   7
3   8   4   0   3
4   5   6   9   9

def some_func(a, b, c):
    return a+b, a+b+c

df['dd'], df['ee'] = df.apply(lambda x: some_func(a = x['A'], b = x['B'], c = x['C']), axis=1, result_type='expand')

df.head()

   A    B   C   D   dd  ee
0   8   2   5   0   0   1
1   9   9   8   6   0   1
2   4   0   1   7   0   1
3   8   4   0   3   0   1
4   5   6   9   9   0   1

The outputs are all 0 for the first new column, and all 1 for the next new column.第一个新列的输出全为 0,下一个新列的输出全为 1。 I am interested in the correct solution, but I am also curious about why my code resulted this way.我对正确的解决方案很感兴趣,但我也很好奇我的代码为什么会这样。

You can assign to subset ['dd','ee'] :您可以分配给子集['dd','ee']

def some_func(a, b, c):
    return a+b, a+b+c

df[['dd','ee']] = df.apply(lambda x: some_func(a = x['A'], 
                                               b = x['B'], 
                                               c = x['C']), axis=1, result_type='expand')
print (df)
   A  B  C  D  dd  ee
0  4  7  7  3  11  18
1  2  1  3  4   3   6
2  4  7  6  0  11  17
3  0  9  1  1   9  10
4  5  6  5  9  11  16
5  3  2  4  9   5   9

If possible, better/ fatser is use vectorized solution:如果可能,更好/更胖是使用矢量化解决方案:

df = df.assign(dd = df.A + df.B, ee = df.A + df.B + df.C)

Just to explain the 0, 1 part.只是为了解释 0, 1 部分。 0 and 1 are actually the column names of 0 和 1 实际上是

df.apply(lambda x: some_func(a = x['A'], b = x['B'], c = x['C']), axis=1, result_type='expand')

That is那是

x = df.apply(lambda x: some_func(a = x['A'], b = x['B'], c = x['C']), axis=1, result_type='expand')
a, b = x
print(a)    # first column name
print(b)    # second column name

output:
0
1

Finally, you assign最后,你分配

df['dd'], df['ee'] = 0, 1

results in结果是

   A    B   C   D   dd  ee
0   8   2   5   0   0   1
1   9   9   8   6   0   1
2   4   0   1   7   0   1
3   8   4   0   3   0   1
4   5   6   9   9   0   1

Alternative way:替代方式:

df['dd'], df['ee'] = zip(*df.apply(lambda x: some_func(x['A'], x['B'], x['C]) )

                      

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM