按 dataframe 的列值合並兩個數據幀

Question

我有以下 dataframe：

df1 = pd.DataFrame({'Parent': ['Stay home', "Stay home","Stay home", 'Go outside', "Go Outside","Go outside"],
                    'Child' : ['Severe weather', "Severe weather", "Severe weather", 'Sunny', "Sunny", "sunny"]})

    Parent      Child
0   Stay home   Severe weather
1   Stay home   Severe weather
2   Stay home   Severe weather
3   Go outside  Sunny
4   Go Outside  Sunny
5   Go outside  sunny

第二個：

df2 = pd.DataFrame({'Similarity_Score': ['SimilarityScore:0.43693185876069784', 'SimilarityScore:0.299807821163373']})

             Similarity_Score
0   SimilarityScore:0.43693185876069784
1   SimilarityScore:0.299807821163373

我想根據 df1 的子值的值合並兩個數據幀。

預期結果：

     Parent     Child           Similarity_Score
0   Stay home   Severe weather  0.43693185876069784
1   Stay home   Severe weather  0.43693185876069784
2   Stay home   Severe weather  0.43693185876069784
3   Go outside  Sunny           0.299807821163373
4   Go Outside  Sunny           0.299807821163373
5   Go outside  sunny           0.299807821163373

我嘗試了通常的merge和concat方法，但找不到解決方案。 有任何想法嗎？

Answer 1

如果要根據 Child 的值分配分數，可以這樣做：

import numpy as np
import pandas as pd

df1 = pd.DataFrame({'Parent': ['Stay home', "Stay home", "Stay home", 'Go outside', "Go Outside", "Go outside"],
                    'Child': ['Severe weather', "Severe weather", "Severe weather", 'Sunny', "Sunny", "Sunny"]})
df2 = pd.DataFrame({'Similarity_Score': ['SimilarityScore:0.43693185876069784', 'SimilarityScore:0.299807821163373']})

# Split the string at : and convert to float
df2['Similarity_Score'] = df2['Similarity_Score'].str.split(':').str[1].astype(float)

# calculate auxiliary column position to base the matching on
df1['position'] = df1['Child'].apply(lambda row: np.where(df1['Child'].unique() == row)[0][0])

# merge both dataframes and drop auxiliary column position
df = df1.merge(df2, left_on='position', right_index=True).drop(columns=["position"])

Answer 2

根據您的回答，合並是基於索引，在 df1 中識別唯一值之后


# identifying the group
df1['key']=df1.groupby(['Parent','Child']).ngroup(ascending=False)
df1

# merge the two DF, and while merging split the similarity scope to take only numeric part

(df1.merge(df2['Similarity_Score'].str.split(':', expand=True)[1],
          left_on='key', 
          right_index=True)
    .drop(columns='key'))

Parent  Child   1
0   Stay home   Severe weather  0.43693185876069784
1   Stay home   Severe weather  0.43693185876069784
2   Stay home   Severe weather  0.43693185876069784
3   Go outside  Sunny   0.299807821163373
4   Go outside  Sunny   0.299807821163373
5   Go outside  Sunny   0.299807821163373

Answer 3

是否有您要加入的特定參數？ 晴天 (~0.299) 和惡劣天氣的分數是否保持不變？ 如果是這種情況，則創建一個二進制列並基於此合並。

按 dataframe 的列值合並兩個數據幀

問題描述

2 個解決方案

解決方案1
1 已采納 2022-09-26 19:37:46

解決方案2
0 2022-09-26 19:23:57

解決方案3
0 2022-09-26 19:25:44

按 dataframe 的列值合並兩個數據幀

問題描述

2 個解決方案

解決方案1 1 已采納 2022-09-26 19:37:46

解決方案2 0 2022-09-26 19:23:57

解決方案3 0 2022-09-26 19:25:44

解決方案1
1 已采納 2022-09-26 19:37:46

解決方案2
0 2022-09-26 19:23:57

解決方案3
0 2022-09-26 19:25:44