简体   繁体   English

如何通过在pandas.DataFrame中逐行应用函数来生成多个新列?

[英]How to generate multiple new columns through applying function by row in pandas.DataFrame?

I am trying to run apply in pandas.DataFrame so that a function would run through the whole table, taking a few column fields as input, and generate multiple new fields at the same time, and once the scan is done, the new fields could form extra multiple new columns.我正在尝试在pandas.DataFrame运行apply以便一个函数可以运行整个表,将几个列字段作为输入,并同时生成多个新字段,一旦扫描完成,新字段就可以形成额外的多个新列。

Conceptually the following describes what I need: to apply a function f to the DataFrame column-wise to generate multiple new columns at the same time :概念以下介绍我需要什么:应用功能f到数据帧逐列产生在同一时间多个新栏目:

f :: field1, field2, field3, ... -> newfield1, newfield2,...

when I apply this function to the DataFrame, it gives me当我将此函数应用于 DataFrame 时,它​​给了我

f' :: column1, column2, column3, ... -> newcolumn1, newcolumn2, ...

Here is an example:下面是一个例子:

>>> df
   denominator  numerator
0            3         10
1            5         12
2            7         14

I would like to create two more columns, quotient and remainder .我想再创建两列, quotientremainder

In this particular example I could run // and % separately because it is trivial but it is not the preferred because I can technically get both quotient and remainder at the same time.在这个特定的例子中,我可以单独运行//%因为它是微不足道的,但它不是首选,因为我可以在技术上同时获得商和余数。 In some real world cases, getting them at the same time is more efficient.在某些实际情况下,同时获取它们更有效。

The following is what I came up with but I don't know if it is the most pythonic way of doing it.以下是我想出的,但我不知道这是否是最 Pythonic 的方法。 How df.apply turns a sequence of row-based pd.Series into columns is also not clear to me.我也df.apply如何将一系列基于行的pd.Series转换为列。

>>> def rundivmod(n, d):
...   q, r = divmod(n, d)
...   return {'quotient': q, 'remainder': r}
>>> pd.merge(df, df.apply(lambda row: pd.Series(rundivmod(row.numerator, row.denominator)), axis=1), left_index=True, right_index=True)
   denominator  numerator  quotient  remainder
0            3         10         3          1
1            5         12         2          2
2            7         14         2          0

EDIT: removed my other method to generate quotient and remainder separately as they are misleading in this case.编辑:删除了我的另一种方法来分别生成quotientremainder因为它们在这种情况下会产生误导。

Function:功能:

def rundivmod(n, d):
    return divmod(n, d)

Code:代码:

out = df.apply(lambda x: rundivmod(x['numerator'], x['denominator']) ,1).apply(pd.Series)
out.columns = ['quotient', 'remainder']
df = pd.concat([df, out], 1)

Output:输出:

    denominator numerator   quotient    remainder
0   3             10          3          1
1   5             12          2          2
2   7             14          2          0

In general you should avoid apply if possible, many operations can be done without iterating over the rows.一般来说,如果可能,您应该避免apply ,许多操作可以在不迭代行的情况下完成。 But if for some reason you must, you can create a function that returns a Series after acting on the rows and then concat that back.但是,如果由于某种原因,你必须,你可以创建一个函数,作用于该行,然后后返回一系列concat该回来了。

import pandas as pd
df = pd.DataFrame({'data': [2,3,4,5]})

Raises 'data' to multiple powers 1将“数据”提升到多个幂1

def apply_pow(row, N):
    return pd.Series(row['data']**np.array(range(N)),
                     index=[f'power_{i}' for i in range(N)],  # become col names
                     )

pd.concat([df, df.apply(apply_pow, N=3, axis=1)], axis=1)
#   data  power_0  power_1  power_2
#0     2        1        2        4
#1     3        1        3        9
#2     4        1        4       16
#3     5        1        5       25

1 should be vectorized using np.vander(df['data'], N=3, increasing=True) 1应该使用np.vander(df['data'], N=3, increasing=True)向量化

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM