简体   繁体   中英

pandas error in df.apply() only for a specific dataframe

Noticed something very strange in pandas. My dataframe(with 3 rows and 3 columns) looks like this:

在此处输入图像描述

When I try to extract ID and Name(separated by underscore) to their own columns using command below, it gives me an error:

df[['ID','Name']] = df.apply(lambda x: get_first_last(x['ID_Name']), axis=1, result_type='broadcast')

Error is:

ValueError: cannot broadcast result

Here's the interesting part though..When I delete the "From_To" column from the original dataframe, performing the same df.apply() to split ID_Name works perfectly fine and I get the new columns like this: 在此处输入图像描述

I have checked a lot of SO answers but none seem to help. What did I miss here?

PS get_first_last is a very simple function like this:

def get_first_last(s):
    str_lis = s.split("_")
    return [str_lis[0], str_lis[1]]

From the doc of pandas.DataFrame.apply :

'broadcast': results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.

So the problem is that the original shape of your dataframe is (3, 3) and the result of your apply function is 2 columns, so you have a mismatch. and that also explane why when you delete the "From_To", the new shape is (3, 2) and now you have a match...

You can use 'broadcast' instead of 'expand' and you will have your expected result.

  table = [
      ['1_john', 23, 'LoNDon_paris'],
      ['2_bob', 34, 'Madrid_milan'],
      ['3_abdellah', 26, 'Paris_Stockhom']
  ]
  df = pd.DataFrame(table, columns=['ID_Name', 'Score', 'From_to'])
  df[['ID','Name']] = df.apply(lambda x: get_first_last(x['ID_Name']), axis=1, result_type='expand')

hope this helps !!

It's definitely not a good use case to use apply , you should rather do:

df[["ID", "Name"]]=df["ID_Name"].str.split("_", expand=True, n=1)

Which for your data will output (I took only first 2 columns from your data frame):

   ID_Name  Score ID   Name
0   1_john     23  1   john
1    2_bob     34  2    bob
2  3_janet     45  3  janet

Now n=1 is just in case you would have multiple _ (eg as a part of the name) - to make sure you will return at most 2 columns (otherwise the above code would fail)

For instance, if we slightly modify your code, we get the following output:

    ID_Name  Score ID    Name
0    1_john     23  1    john
1  2_bob_jr     34  2  bob_jr
2   3_janet     45  3   janet

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM