简体   繁体   English

`pandas.DataFrame.apply` 实际操作的是什么?

[英]What is `pandas.DataFrame.apply` actually operating on?

I have two questions, but first I will give the context.我有两个问题,但首先我会给出上下文。 I am trying to use a pandas DataFrame with some existing code using a functional programming approach.我正在尝试使用函数式编程方法将 pandas DataFrame与一些现有代码一起使用。 I basically want to map a function to every row of a DataFrame , expanding the row using the double-asterisk keyword argument notation, where each column name of the DataFrame corresponds to one of the arguments of the existing function. I basically want to map a function to every row of a DataFrame , expanding the row using the double-asterisk keyword argument notation, where each column name of the DataFrame corresponds to one of the arguments of the existing function.

For example, say I have the following function.例如,假设我有以下 function。

def line(m, x, b):
    y = (m * x) + b

    return y

And I have a pandas DataFrame我有一个 pandas DataFrame

data = [{"b": 1, "m": 1, "x": 2}, {"b": 2, "m": 2, "x": 3}]
df = pd.DataFrame(data)

# Returns
#    b  m  x
# 0  1  1  2
# 1  2  2  3

Ultimately, I want to construct a column in the DataFrame from the results of line applied to each row;最终,我想根据应用于每一行的line的结果在DataFrame中构造一列; something like the following.类似于以下内容。

# Note that I'm using the list of dicts defined above, not the DataFrame.
results = [line(**datum) for datum in data]

I feel like I should be able to use some combination of DataFrame.apply , a lambda , probably Series.to_dict , and the double-asterisk keyword argument expansion but I can't figure out what is passed to the lambda in the following expression. I feel like I should be able to use some combination of DataFrame.apply , a lambda , probably Series.to_dict , and the double-asterisk keyword argument expansion but I can't figure out what is passed to the lambda in the following expression.

df.apply(lambda x: x, axis=1)
#               ^
#               What is pandas passing to my identity lambda?

I've tried to inspect with type and x.__class__ , but both of the following lines throw TypeErrors .我尝试使用typex.__class__进行检查,但以下两行都抛出TypeErrors

df.apply(lambda x: type(x), axis=1)
df.apply(lambda x: x.__class__, axis=1)

I don't want to write/refactor a new line function that can wrangle some pandas object because I shouldn't have to.我不想编写/重构一个新line function ,因为我不应该这样做。 Ultimately, I want to end up with a DataFrame with columns for the input data and a column with the corresponding output of the line function.最终,我想得到一个DataFrame ,其中包含输入数据的列和对应的 output line function 的列。

My two questions are:我的两个问题是:

  1. How can I pass a row of a pandas DataFrame to a function using keyword-argument expansion, either using the DataFrame.apply method or some other (functional) approach? How can I pass a row of a pandas DataFrame to a function using keyword-argument expansion, either using the DataFrame.apply method or some other (functional) approach?
  2. What exactly is DataFrame.apply passing to the function that I specify?传递给我指定的DataFrame.apply的 DataFrame.apply 到底是什么?

Maybe there is some other functional approach I could take that I'm just not aware of, but I figure pandas is a pretty popular library for this kind of thing and that's why I'm trying to use it.也许还有一些我不知道的其他功能方法,但我认为 pandas 是一个非常流行的库,这就是我尝试使用它的原因。 Also there are some data (de)serialization issues I'm facing that pandas should make pretty easy vs. writing a more bespoke solution.还有一些我面临的数据(反)序列化问题 pandas 应该很容易而不是编写更定制的解决方案。

Thanks.谢谢。

Maybe this is what you are looking for.也许这就是你要找的。

1) 1)

df.apply(lambda x: line(**x.to_dict()), axis=1)

Result结果

0    3
1    8

2) 2)

The function for df.apply(..., axis=1) receives a Series representing a row with the column names as index entries. df.apply(..., axis=1)的 function 接收一个 Series,该Series表示以列名作为索引条目的行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM