如何通过在pandas.DataFrame中逐行应用函数来生成多个新列？

Question

I am trying to run apply in pandas.DataFrame so that a function would run through the whole table, taking a few column fields as input, and generate multiple new fields at the same time, and once the scan is done, the new fields could form extra multiple new columns.我正在尝试在pandas.DataFrame运行apply以便一个函数可以运行整个表，将几个列字段作为输入，并同时生成多个新字段，一旦扫描完成，新字段就可以形成额外的多个新列。

Conceptually the following describes what I need: to apply a function f to the DataFrame column-wise to generate multiple new columns at the same time :概念以下介绍我需要什么：应用功能f到数据帧逐列产生在同一时间多个新栏目：

f :: field1, field2, field3, ... -> newfield1, newfield2,...

when I apply this function to the DataFrame, it gives me当我将此函数应用于 DataFrame 时，它给了我

f' :: column1, column2, column3, ... -> newcolumn1, newcolumn2, ...

Here is an example:下面是一个例子：

>>> df
   denominator  numerator
0            3         10
1            5         12
2            7         14

I would like to create two more columns, quotient and remainder .我想再创建两列， quotient和remainder 。

In this particular example I could run // and % separately because it is trivial but it is not the preferred because I can technically get both quotient and remainder at the same time.在这个特定的例子中，我可以单独运行//和%因为它是微不足道的，但它不是首选，因为我可以在技术上同时获得商和余数。 In some real world cases, getting them at the same time is more efficient.在某些实际情况下，同时获取它们更有效。

The following is what I came up with but I don't know if it is the most pythonic way of doing it.以下是我想出的，但我不知道这是否是最 Pythonic 的方法。 How df.apply turns a sequence of row-based pd.Series into columns is also not clear to me.我也df.apply如何将一系列基于行的pd.Series转换为列。

>>> def rundivmod(n, d):
...   q, r = divmod(n, d)
...   return {'quotient': q, 'remainder': r}
>>> pd.merge(df, df.apply(lambda row: pd.Series(rundivmod(row.numerator, row.denominator)), axis=1), left_index=True, right_index=True)
   denominator  numerator  quotient  remainder
0            3         10         3          1
1            5         12         2          2
2            7         14         2          0

EDIT: removed my other method to generate quotient and remainder separately as they are misleading in this case.编辑：删除了我的另一种方法来分别生成quotient和remainder因为它们在这种情况下会产生误导。

Answer 1

Function:功能：

def rundivmod(n, d):
    return divmod(n, d)

Code:代码：

out = df.apply(lambda x: rundivmod(x['numerator'], x['denominator']) ,1).apply(pd.Series)
out.columns = ['quotient', 'remainder']
df = pd.concat([df, out], 1)

Output:输出：

    denominator numerator   quotient    remainder
0   3             10          3          1
1   5             12          2          2
2   7             14          2          0

Answer 2

In general you should avoid apply if possible, many operations can be done without iterating over the rows.一般来说，如果可能，您应该避免apply ，许多操作可以在不迭代行的情况下完成。 But if for some reason you must, you can create a function that returns a Series after acting on the rows and then concat that back.但是，如果由于某种原因，你必须，你可以创建一个函数，作用于该行，然后后返回一系列concat该回来了。

import pandas as pd
df = pd.DataFrame({'data': [2,3,4,5]})

Raises 'data' to multiple powers ¹将“数据”提升到多个幂¹

def apply_pow(row, N):
    return pd.Series(row['data']**np.array(range(N)),
                     index=[f'power_{i}' for i in range(N)],  # become col names
                     )

pd.concat([df, df.apply(apply_pow, N=3, axis=1)], axis=1)
#   data  power_0  power_1  power_2
#0     2        1        2        4
#1     3        1        3        9
#2     4        1        4       16
#3     5        1        5       25

¹ should be vectorized using np.vander(df['data'], N=3, increasing=True) ¹应该使用np.vander(df['data'], N=3, increasing=True)向量化

如何通过在pandas.DataFrame中逐行应用函数来生成多个新列？

问题描述

2 个解决方案

解决方案1
3 已采纳 2020-03-12 16:07:02

解决方案2
1 2020-03-12 16:14:17

如何通过在pandas.DataFrame中逐行应用函数来生成多个新列？

问题描述

2 个解决方案

解决方案1 3 已采纳 2020-03-12 16:07:02

解决方案2 1 2020-03-12 16:14:17

解决方案1
3 已采纳 2020-03-12 16:07:02

解决方案2
1 2020-03-12 16:14:17