比較兩個熊貓數據框並根據條件更新一個數據框的最有效方法

Question

我有兩個數據框 df1 和 df2。 df2 由“tagname”和“value”列組成。 字典“bucket_dict”保存來自 df2 的數據。

bucket_dict = dict(zip(df2.tagname,df2.value))

在 df1 中有數百萬行。 3 列在 df1 中有“apptag”、“評論”和“類型”。 我想在這兩個數據幀之間進行匹配，如果

bucket_dict 中的“字典鍵”包含在 df1["apptag"] 中，然后更新 df1["comments"] = 對應字典鍵和 df1["Type"] = 對應 bucket_dict["key name"] 的值。 我使用了以下代碼：

for each_tag in bucket_dict: 
    df1.loc[(df1["apptag"].str.match(each_tag, case = False ,na = False)), "comments"] =  each_tag
    df1.loc[(df1["apptag"].str.match(each_tag, case = False ,na = False)), "Type"] =  bucket_dict[each_tag]

有沒有什么有效的方法可以做到這一點，因為它需要更長的時間。

對已創建字典的 df 進行分桶：

bucketing_df = pd.DataFrame([["pen", "study"], ["pencil", "study"], ["ersr","study"],["rice","grocery"],["wht","grocery"]], columns=['tagname', 'value'])

其他數據框：

  output_df = pd.DataFrame([["test123-pen", "pen"," "], ["test234-pencil", "pencil"," "], ["test234-rice","rice", " "], columns=['apptag', 'comments','type'])

所需輸出：

Answer 1

您可以通過以這種方式在您的comments列上調用應用程序以及在您的bucketing_df上調用一個loc來做到這一點 -

def find_type(a):
    try:
        return (bucketing_df.loc[[x in a for x in bucketing_df['tagname']]])['value'].values[0]
    except:
        return ""

def find_comments(a):
    try:
        return (bucketing_df.loc[[x in a for x in bucketing_df['tagname']]])['tagname'].values[0]
    except:
        return ""


output_df['type'] = output_df['apptag'].apply(lambda a: find_type(a))
output_df['comments'] = output_df['apptag'].apply(lambda a:find_comments(a))

在這里，我不得不讓它們獨立的功能，以便它可以處理apptag中不存在tagname apptag

它給你這個作為output_df -

           apptag comments     type
0     test123-pen      pen    study
1  test234-pencil   pencil    study
2    test234-rice     rice  grocery

所有這些代碼使用的是您在問題結束時提供的現有bucketing_df和output_df 。

比較兩個熊貓數據框並根據條件更新一個數據框的最有效方法

問題描述

1 個解決方案

解決方案1
0 已采納 2020-02-03 09:58:48

比較兩個熊貓數據框並根據條件更新一個數據框的最有效方法

問題描述

1 個解決方案

解決方案1 0 已采納 2020-02-03 09:58:48

解決方案1
0 已采納 2020-02-03 09:58:48