[英]pandas error in df.apply() only for a specific dataframe
Noticed something very strange in pandas.在 pandas 中发现了一些非常奇怪的东西。 My dataframe(with 3 rows and 3 columns) looks like this:
我的数据框(3 行 3 列)如下所示:
When I try to extract ID and Name(separated by underscore) to their own columns using command below, it gives me an error:当我尝试使用下面的命令将 ID 和名称(用下划线分隔)提取到他们自己的列时,它给了我一个错误:
df[['ID','Name']] = df.apply(lambda x: get_first_last(x['ID_Name']), axis=1, result_type='broadcast')
Error is:错误是:
ValueError: cannot broadcast result
Here's the interesting part though..When I delete the "From_To" column from the original dataframe, performing the same df.apply() to split ID_Name works perfectly fine and I get the new columns like this:不过,这是有趣的部分..当我从原始 dataframe 中删除“From_To”列时,执行相同的 df.apply() 来拆分 ID_Name 工作得很好,我得到了这样的新列:
I have checked a lot of SO answers but none seem to help.我已经检查了很多 SO 答案,但似乎没有任何帮助。 What did I miss here?
我在这里错过了什么?
PS get_first_last is a very simple function like this: PS get_first_last 是一个非常简单的 function 像这样:
def get_first_last(s):
str_lis = s.split("_")
return [str_lis[0], str_lis[1]]
From the doc of pandas.DataFrame.apply :来自pandas.DataFrame.apply的文档:
'broadcast': results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained. 'broadcast':将结果广播到DataFrame的原始形状,保留原始索引和列。
So the problem is that the original shape of your dataframe is (3, 3) and the result of your apply function is 2 columns, so you have a mismatch.所以问题是你的 dataframe 的原始形状是 (3, 3) 并且你应用 function 的结果是 2 列,所以你有一个不匹配。 and that also explane why when you delete the "From_To", the new shape is (3, 2) and now you have a match...
这也解释了为什么当你删除“From_To”时,新形状是 (3, 2),现在你有一个匹配...
You can use 'broadcast' instead of 'expand' and you will have your expected result.您可以使用“广播”而不是“扩展”,您将获得预期的结果。
table = [
['1_john', 23, 'LoNDon_paris'],
['2_bob', 34, 'Madrid_milan'],
['3_abdellah', 26, 'Paris_Stockhom']
]
df = pd.DataFrame(table, columns=['ID_Name', 'Score', 'From_to'])
df[['ID','Name']] = df.apply(lambda x: get_first_last(x['ID_Name']), axis=1, result_type='expand')
hope this helps !!希望这可以帮助 !!
It's definitely not a good use case to use apply
, you should rather do:使用
apply
绝对不是一个好的用例,您应该这样做:
df[["ID", "Name"]]=df["ID_Name"].str.split("_", expand=True, n=1)
Which for your data will output (I took only first 2 columns from your data frame):您的数据将为 output (我只从您的数据框中取出前 2 列):
ID_Name Score ID Name
0 1_john 23 1 john
1 2_bob 34 2 bob
2 3_janet 45 3 janet
Now n=1
is just in case you would have multiple _
(eg as a part of the name) - to make sure you will return at most 2 columns (otherwise the above code would fail)现在
n=1
以防万一您有多个_
(例如,作为名称的一部分)-确保您最多返回 2 列(否则上面的代码将失败)
For instance, if we slightly modify your code, we get the following output:例如,如果我们稍微修改您的代码,我们会得到以下 output:
ID_Name Score ID Name
0 1_john 23 1 john
1 2_bob_jr 34 2 bob_jr
2 3_janet 45 3 janet
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.