[英]Two dataframes merge, groupby, aggregate by conctatenating values in a column
[英]Merge two dataframes groupby the column values of a dataframe
我有以下 dataframe:
df1 = pd.DataFrame({'Parent': ['Stay home', "Stay home","Stay home", 'Go outside', "Go Outside","Go outside"],
'Child' : ['Severe weather', "Severe weather", "Severe weather", 'Sunny', "Sunny", "sunny"]})
Parent Child
0 Stay home Severe weather
1 Stay home Severe weather
2 Stay home Severe weather
3 Go outside Sunny
4 Go Outside Sunny
5 Go outside sunny
第二個:
df2 = pd.DataFrame({'Similarity_Score': ['SimilarityScore:0.43693185876069784', 'SimilarityScore:0.299807821163373']})
Similarity_Score
0 SimilarityScore:0.43693185876069784
1 SimilarityScore:0.299807821163373
我想根據 df1 的子值的值合並兩個數據幀。
預期結果:
Parent Child Similarity_Score
0 Stay home Severe weather 0.43693185876069784
1 Stay home Severe weather 0.43693185876069784
2 Stay home Severe weather 0.43693185876069784
3 Go outside Sunny 0.299807821163373
4 Go Outside Sunny 0.299807821163373
5 Go outside sunny 0.299807821163373
我嘗試了通常的merge
和concat
方法,但找不到解決方案。 有任何想法嗎?
如果要根據 Child 的值分配分數,可以這樣做:
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'Parent': ['Stay home', "Stay home", "Stay home", 'Go outside', "Go Outside", "Go outside"],
'Child': ['Severe weather', "Severe weather", "Severe weather", 'Sunny', "Sunny", "Sunny"]})
df2 = pd.DataFrame({'Similarity_Score': ['SimilarityScore:0.43693185876069784', 'SimilarityScore:0.299807821163373']})
# Split the string at : and convert to float
df2['Similarity_Score'] = df2['Similarity_Score'].str.split(':').str[1].astype(float)
# calculate auxiliary column position to base the matching on
df1['position'] = df1['Child'].apply(lambda row: np.where(df1['Child'].unique() == row)[0][0])
# merge both dataframes and drop auxiliary column position
df = df1.merge(df2, left_on='position', right_index=True).drop(columns=["position"])
根據您的回答,合並是基於索引,在 df1 中識別唯一值之后
# identifying the group
df1['key']=df1.groupby(['Parent','Child']).ngroup(ascending=False)
df1
# merge the two DF, and while merging split the similarity scope to take only numeric part
(df1.merge(df2['Similarity_Score'].str.split(':', expand=True)[1],
left_on='key',
right_index=True)
.drop(columns='key'))
Parent Child 1
0 Stay home Severe weather 0.43693185876069784
1 Stay home Severe weather 0.43693185876069784
2 Stay home Severe weather 0.43693185876069784
3 Go outside Sunny 0.299807821163373
4 Go outside Sunny 0.299807821163373
5 Go outside Sunny 0.299807821163373
是否有您要加入的特定參數? 晴天 (~0.299) 和惡劣天氣的分數是否保持不變? 如果是這種情況,則創建一個二進制列並基於此合並。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.