[英]How to generate multiple new columns through applying function by row in pandas.DataFrame?
I am trying to run apply
in pandas.DataFrame
so that a function would run through the whole table, taking a few column fields as input, and generate multiple new fields at the same time, and once the scan is done, the new fields could form extra multiple new columns.我正在尝试在pandas.DataFrame
运行apply
以便一个函数可以运行整个表,将几个列字段作为输入,并同时生成多个新字段,一旦扫描完成,新字段就可以形成额外的多个新列。
Conceptually the following describes what I need: to apply a function f
to the DataFrame column-wise to generate multiple new columns at the same time :概念以下介绍我需要什么:应用功能f
到数据帧逐列产生在同一时间多个新栏目:
f :: field1, field2, field3, ... -> newfield1, newfield2,...
when I apply this function to the DataFrame, it gives me当我将此函数应用于 DataFrame 时,它给了我
f' :: column1, column2, column3, ... -> newcolumn1, newcolumn2, ...
Here is an example:下面是一个例子:
>>> df
denominator numerator
0 3 10
1 5 12
2 7 14
I would like to create two more columns, quotient
and remainder
.我想再创建两列, quotient
和remainder
。
In this particular example I could run //
and %
separately because it is trivial but it is not the preferred because I can technically get both quotient and remainder at the same time.在这个特定的例子中,我可以单独运行//
和%
因为它是微不足道的,但它不是首选,因为我可以在技术上同时获得商和余数。 In some real world cases, getting them at the same time is more efficient.在某些实际情况下,同时获取它们更有效。
The following is what I came up with but I don't know if it is the most pythonic way of doing it.以下是我想出的,但我不知道这是否是最 Pythonic 的方法。 How df.apply
turns a sequence of row-based pd.Series
into columns is also not clear to me.我也df.apply
如何将一系列基于行的pd.Series
转换为列。
>>> def rundivmod(n, d):
... q, r = divmod(n, d)
... return {'quotient': q, 'remainder': r}
>>> pd.merge(df, df.apply(lambda row: pd.Series(rundivmod(row.numerator, row.denominator)), axis=1), left_index=True, right_index=True)
denominator numerator quotient remainder
0 3 10 3 1
1 5 12 2 2
2 7 14 2 0
EDIT: removed my other method to generate quotient
and remainder
separately as they are misleading in this case.编辑:删除了我的另一种方法来分别生成quotient
和remainder
因为它们在这种情况下会产生误导。
Function:功能:
def rundivmod(n, d):
return divmod(n, d)
Code:代码:
out = df.apply(lambda x: rundivmod(x['numerator'], x['denominator']) ,1).apply(pd.Series)
out.columns = ['quotient', 'remainder']
df = pd.concat([df, out], 1)
Output:输出:
denominator numerator quotient remainder
0 3 10 3 1
1 5 12 2 2
2 7 14 2 0
In general you should avoid apply
if possible, many operations can be done without iterating over the rows.一般来说,如果可能,您应该避免apply
,许多操作可以在不迭代行的情况下完成。 But if for some reason you must, you can create a function that returns a Series after acting on the rows and then concat
that back.但是,如果由于某种原因,你必须,你可以创建一个函数,作用于该行,然后后返回一系列concat
该回来了。
import pandas as pd
df = pd.DataFrame({'data': [2,3,4,5]})
Raises 'data' to multiple powers 1将“数据”提升到多个幂1
def apply_pow(row, N):
return pd.Series(row['data']**np.array(range(N)),
index=[f'power_{i}' for i in range(N)], # become col names
)
pd.concat([df, df.apply(apply_pow, N=3, axis=1)], axis=1)
# data power_0 power_1 power_2
#0 2 1 2 4
#1 3 1 3 9
#2 4 1 4 16
#3 5 1 5 25
1 should be vectorized using np.vander(df['data'], N=3, increasing=True)
1应该使用np.vander(df['data'], N=3, increasing=True)
向量化
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.