I have searched and viewed a bunch of similar questions to my case, unfortunately they seemed not give solution to my case, as the solutions here mostly base on one of the values are np.nan. But here I am looking for the solution that give me a collection of the Last_Name.
I created a small sample code as below:
My dataset is:
dataset=pd.Dataframe({'First_Name':['John','John','John'],
'Last_Name':['Mayers','Mountain','Walts']})
What I have been trying to do, is to have a dataframe with 'First_Name' as 'John', and 'Last_Name' as ['Mayers','Mountain','Walts'], here I tried to create a new column named 'Combine'.
My code was as below:
import re
combine=[]
for i in range(0, len(dataset)):
m=re.match(dataset.loc[i]['Fisrt_Name'],dataset.loc[i]['First_Name'])
if m is not None:
combine.append(dataset.loc[i]['Last_Name'])
dataset.loc[i]['Combine']=combine
dataset
Unfortunately, the code above did not print out any new column named "Combine". if I print out combine alone, it would be a list:['Mayers','Mountain','Walts'] if I used dataset['Combine']=combine, it would print out three individual rows in "Combine" separating the list above, but I want to get an appended result in one row, and then I can drop duplicate rows of First_Name. I have searched a lot of similar questions here, but have not yet found an effective way to solve this. I tried sort_value'First_Name' too, but this did not help me append the non-overlapping "Last_Name". Any ideas? Thank you so much!
If I understand correctly,
df_new = df.groupby(['First_Name'])['Last_Name'].apply(lambda x : ','.join(x)).to_frame()
print(df_new)
Last_Name
First_Name
John Mayers,Mountain,Walts
or as Jon succinctly pointed out, we can make use of a native python method str.join
within the apply
df_new = df.groupby(["First_Name"])["Last_Name"].apply(','.join).to_frame()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.