df.apply() 中的 pandas 错误仅适用于特定的 dataframe

Question

Noticed something very strange in pandas.在 pandas 中发现了一些非常奇怪的东西。 My dataframe(with 3 rows and 3 columns) looks like this:我的数据框（3 行 3 列）如下所示：

When I try to extract ID and Name(separated by underscore) to their own columns using command below, it gives me an error:当我尝试使用下面的命令将 ID 和名称（用下划线分隔）提取到他们自己的列时，它给了我一个错误：

df[['ID','Name']] = df.apply(lambda x: get_first_last(x['ID_Name']), axis=1, result_type='broadcast')

Error is:错误是：

ValueError: cannot broadcast result

Here's the interesting part though..When I delete the "From_To" column from the original dataframe, performing the same df.apply() to split ID_Name works perfectly fine and I get the new columns like this:不过，这是有趣的部分..当我从原始 dataframe 中删除“From_To”列时，执行相同的 df.apply() 来拆分 ID_Name 工作得很好，我得到了这样的新列：

I have checked a lot of SO answers but none seem to help.我已经检查了很多 SO 答案，但似乎没有任何帮助。 What did I miss here?我在这里错过了什么？

PS get_first_last is a very simple function like this: PS get_first_last 是一个非常简单的 function 像这样：

def get_first_last(s):
    str_lis = s.split("_")
    return [str_lis[0], str_lis[1]]

Answer 1

From the doc of pandas.DataFrame.apply :来自pandas.DataFrame.apply的文档：

'broadcast': results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained. 'broadcast'：将结果广播到DataFrame的原始形状，保留原始索引和列。

So the problem is that the original shape of your dataframe is (3, 3) and the result of your apply function is 2 columns, so you have a mismatch.所以问题是你的 dataframe 的原始形状是 (3, 3) 并且你应用 function 的结果是 2 列，所以你有一个不匹配。 and that also explane why when you delete the "From_To", the new shape is (3, 2) and now you have a match...这也解释了为什么当你删除“From_To”时，新形状是 (3, 2)，现在你有一个匹配...

You can use 'broadcast' instead of 'expand' and you will have your expected result.您可以使用“广播”而不是“扩展”，您将获得预期的结果。

  table = [
      ['1_john', 23, 'LoNDon_paris'],
      ['2_bob', 34, 'Madrid_milan'],
      ['3_abdellah', 26, 'Paris_Stockhom']
  ]
  df = pd.DataFrame(table, columns=['ID_Name', 'Score', 'From_to'])
  df[['ID','Name']] = df.apply(lambda x: get_first_last(x['ID_Name']), axis=1, result_type='expand')

hope this helps !!希望这可以帮助！！

Answer 2

It's definitely not a good use case to use apply , you should rather do:使用apply绝对不是一个好的用例，您应该这样做：

df[["ID", "Name"]]=df["ID_Name"].str.split("_", expand=True, n=1)

Which for your data will output (I took only first 2 columns from your data frame):您的数据将为 output （我只从您的数据框中取出前 2 列）：

   ID_Name  Score ID   Name
0   1_john     23  1   john
1    2_bob     34  2    bob
2  3_janet     45  3  janet

Now n=1 is just in case you would have multiple _ (eg as a part of the name) - to make sure you will return at most 2 columns (otherwise the above code would fail)现在n=1以防万一您有多个_ （例如，作为名称的一部分）-确保您最多返回 2 列（否则上面的代码将失败）

For instance, if we slightly modify your code, we get the following output:例如，如果我们稍微修改您的代码，我们会得到以下 output：

    ID_Name  Score ID    Name
0    1_john     23  1    john
1  2_bob_jr     34  2  bob_jr
2   3_janet     45  3   janet

df.apply() 中的 pandas 错误仅适用于特定的 dataframe

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-08-03 08:15:33

解决方案2
0 2020-08-03 08:27:54

df.apply() 中的 pandas 错误仅适用于特定的 dataframe

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-08-03 08:15:33

解决方案2 0 2020-08-03 08:27:54

解决方案1
1 已采纳 2020-08-03 08:15:33

解决方案2
0 2020-08-03 08:27:54