`pandas.DataFrame.apply` 实际操作的是什么？

Question

我有两个问题，但首先我会给出上下文。 我正在尝试使用函数式编程方法将 pandas DataFrame与一些现有代码一起使用。 I basically want to map a function to every row of a DataFrame , expanding the row using the double-asterisk keyword argument notation, where each column name of the DataFrame corresponds to one of the arguments of the existing function.

例如，假设我有以下 function。

def line(m, x, b):
    y = (m * x) + b

    return y

我有一个 pandas DataFrame

data = [{"b": 1, "m": 1, "x": 2}, {"b": 2, "m": 2, "x": 3}]
df = pd.DataFrame(data)

# Returns
#    b  m  x
# 0  1  1  2
# 1  2  2  3

最终，我想根据应用于每一行的line的结果在DataFrame中构造一列； 类似于以下内容。

# Note that I'm using the list of dicts defined above, not the DataFrame.
results = [line(**datum) for datum in data]

I feel like I should be able to use some combination of DataFrame.apply , a lambda , probably Series.to_dict , and the double-asterisk keyword argument expansion but I can't figure out what is passed to the lambda in the following expression.

df.apply(lambda x: x, axis=1)
#               ^
#               What is pandas passing to my identity lambda?

我尝试使用type和x.__class__进行检查，但以下两行都抛出TypeErrors 。

df.apply(lambda x: type(x), axis=1)
df.apply(lambda x: x.__class__, axis=1)

我不想编写/重构一个新line function ，因为我不应该这样做。 最终，我想得到一个DataFrame ，其中包含输入数据的列和对应的 output line function 的列。

我的两个问题是：

How can I pass a row of a pandas DataFrame to a function using keyword-argument expansion, either using the DataFrame.apply method or some other (functional) approach?
传递给我指定的DataFrame.apply的 DataFrame.apply 到底是什么？

也许还有一些我不知道的其他功能方法，但我认为 pandas 是一个非常流行的库，这就是我尝试使用它的原因。 还有一些我面临的数据（反）序列化问题 pandas 应该很容易而不是编写更定制的解决方案。

谢谢。

Answer 1

也许这就是你要找的。

1)

df.apply(lambda x: line(**x.to_dict()), axis=1)

结果

0    3
1    8

2)

df.apply(..., axis=1)的 function 接收一个 Series，该Series表示以列名作为索引条目的行。

`pandas.DataFrame.apply` 实际操作的是什么？

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-09-17 16:46:00

`pandas.DataFrame.apply` 实际操作的是什么？

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-09-17 16:46:00

解决方案1
1 已采纳 2022-09-17 16:46:00