Drop duplicates in First_Name rows but append a collection of Last_Name

Question

I have searched and viewed a bunch of similar questions to my case, unfortunately they seemed not give solution to my case, as the solutions here mostly base on one of the values are np.nan. But here I am looking for the solution that give me a collection of the Last_Name.

I created a small sample code as below:

My dataset is:

dataset=pd.Dataframe({'First_Name':['John','John','John'], 
                      'Last_Name':['Mayers','Mountain','Walts']})

What I have been trying to do, is to have a dataframe with 'First_Name' as 'John', and 'Last_Name' as ['Mayers','Mountain','Walts'], here I tried to create a new column named 'Combine'.

My code was as below:

import re

combine=[]

    for i in range(0, len(dataset)):
        m=re.match(dataset.loc[i]['Fisrt_Name'],dataset.loc[i]['First_Name'])
        if m is not None:
           combine.append(dataset.loc[i]['Last_Name'])
    dataset.loc[i]['Combine']=combine
dataset

Unfortunately, the code above did not print out any new column named "Combine". if I print out combine alone, it would be a list:['Mayers','Mountain','Walts'] if I used dataset['Combine']=combine, it would print out three individual rows in "Combine" separating the list above, but I want to get an appended result in one row, and then I can drop duplicate rows of First_Name. I have searched a lot of similar questions here, but have not yet found an effective way to solve this. I tried sort_value'First_Name' too, but this did not help me append the non-overlapping "Last_Name". Any ideas? Thank you so much!

Answer 1

If I understand correctly,

df_new = df.groupby(['First_Name'])['Last_Name'].apply(lambda x : ','.join(x)).to_frame()
print(df_new)
            Last_Name
First_Name                       
John        Mayers,Mountain,Walts

or as Jon succinctly pointed out, we can make use of a native python method str.join within the apply

df_new  = df.groupby(["First_Name"])["Last_Name"].apply(','.join).to_frame()

Drop duplicates in First_Name rows but append a collection of Last_Name

Question

1 answers

solution1
1 ACCPTED 2019-12-17 15:46:23

Drop duplicates in First_Name rows but append a collection of Last_Name

Question

1 answers

solution1 1 ACCPTED 2019-12-17 15:46:23

solution1
1 ACCPTED 2019-12-17 15:46:23