[英]What is `pandas.DataFrame.apply` actually operating on?
I have two questions, but first I will give the context.我有两个问题,但首先我会给出上下文。 I am trying to use a pandas DataFrame
with some existing code using a functional programming approach.我正在尝试使用函数式编程方法将 pandas DataFrame
与一些现有代码一起使用。 I basically want to map a function to every row of a DataFrame
, expanding the row using the double-asterisk keyword argument notation, where each column name of the DataFrame
corresponds to one of the arguments of the existing function. I basically want to map a function to every row of a DataFrame
, expanding the row using the double-asterisk keyword argument notation, where each column name of the DataFrame
corresponds to one of the arguments of the existing function.
For example, say I have the following function.例如,假设我有以下 function。
def line(m, x, b):
y = (m * x) + b
return y
And I have a pandas DataFrame
我有一个 pandas DataFrame
data = [{"b": 1, "m": 1, "x": 2}, {"b": 2, "m": 2, "x": 3}]
df = pd.DataFrame(data)
# Returns
# b m x
# 0 1 1 2
# 1 2 2 3
Ultimately, I want to construct a column in the DataFrame
from the results of line
applied to each row;最终,我想根据应用于每一行的line
的结果在DataFrame
中构造一列; something like the following.类似于以下内容。
# Note that I'm using the list of dicts defined above, not the DataFrame.
results = [line(**datum) for datum in data]
I feel like I should be able to use some combination of DataFrame.apply
, a lambda
, probably Series.to_dict
, and the double-asterisk keyword argument expansion but I can't figure out what is passed to the lambda
in the following expression. I feel like I should be able to use some combination of DataFrame.apply
, a lambda
, probably Series.to_dict
, and the double-asterisk keyword argument expansion but I can't figure out what is passed to the lambda
in the following expression.
df.apply(lambda x: x, axis=1)
# ^
# What is pandas passing to my identity lambda?
I've tried to inspect with type
and x.__class__
, but both of the following lines throw TypeErrors
.我尝试使用type
和x.__class__
进行检查,但以下两行都抛出TypeErrors
。
df.apply(lambda x: type(x), axis=1)
df.apply(lambda x: x.__class__, axis=1)
I don't want to write/refactor a new line
function that can wrangle some pandas object because I shouldn't have to.我不想编写/重构一个新line
function ,因为我不应该这样做。 Ultimately, I want to end up with a DataFrame
with columns for the input data and a column with the corresponding output of the line
function.最终,我想得到一个DataFrame
,其中包含输入数据的列和对应的 output line
function 的列。
My two questions are:我的两个问题是:
DataFrame
to a function using keyword-argument expansion, either using the DataFrame.apply
method or some other (functional) approach? How can I pass a row of a pandas DataFrame
to a function using keyword-argument expansion, either using the DataFrame.apply
method or some other (functional) approach?DataFrame.apply
passing to the function that I specify?传递给我指定的DataFrame.apply
的 DataFrame.apply 到底是什么?Maybe there is some other functional approach I could take that I'm just not aware of, but I figure pandas is a pretty popular library for this kind of thing and that's why I'm trying to use it.也许还有一些我不知道的其他功能方法,但我认为 pandas 是一个非常流行的库,这就是我尝试使用它的原因。 Also there are some data (de)serialization issues I'm facing that pandas should make pretty easy vs. writing a more bespoke solution.还有一些我面临的数据(反)序列化问题 pandas 应该很容易而不是编写更定制的解决方案。
Thanks.谢谢。
Maybe this is what you are looking for.也许这就是你要找的。
1) 1)
df.apply(lambda x: line(**x.to_dict()), axis=1)
Result结果
0 3
1 8
2) 2)
The function for df.apply(..., axis=1)
receives a Series
representing a row with the column names as index entries. df.apply(..., axis=1)
的 function 接收一个 Series,该Series
表示以列名作为索引条目的行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.