簡體   English   中英

如何使用一列作為模式合並兩個熊貓數據框並包括左數據框的列?

[英]How to merge two pandas dataframes using a column as pattern and include columns of the left dataframe?

擁有以下Python代碼,試圖使用pd.merge,但似乎關鍵列必須相同。 嘗試使用df.B中帶有category.Pattern的“ like”運算符連接到類似於SQL的內容。

使用更好的數據示例進行更新

import pandas as pd
import numpy as np
df = pd.DataFrame([[1, 'Gas Station'], [2, 'Servicenter'], [5, 'Bakery good bread'], [58, 'Fresh market MIA'], [76, 'Auto Liberty aa1121']], columns=['A','B'])

    Out[12]:
    A   B
0   1   Gas Station
1   2   Servicenter
2   5   Bakery good bread
3   58  Fresh market MIA
4   76  Auto Liberty aa1121

categories = pd.DataFrame([['Gasoline', 'Gas Station'], ['Gasoline', 'Servicenter'], ['Food', 'Bakery'],  ['Food', 'Fresh market'], ['Insurance', 'Auto Liberty']], columns=['Category','Pattern'])

    Out[13]:
    Category    Pattern
0   Gasoline    Gas Station
1   Gasoline    Servicenter
2   Food    Bakery
3   Food    Fresh market
4   Insurance   Auto Liberty

預期結果是:

    Out[14]:
    A   B                   Category
0   1   Gas Station         Gasoline
1   2   Servicenter         Gasoline
2   5   Bakery good bread   Food
3   58  Fresh market MIA    Food
4   58  Auto Liberty aa1121 Insurance

感謝您的建議/反饋。

df['lower'] = df['B'].str.extract(r'([A-z0-9]+)')
categories['lower'] = categories['pattern'].str.extract(r'([A-z0-9]+)')
final = pd.merge(df, categories)

通過創建類似以下的新功能:

def lookup_table(value, df):
    """

    :param value: value to find the dataframe
    :param df: dataframe which constains the lookup table
    :return: 
        A String representing a the data found
    """
    # Variable Initialization for non found entry in list
    out = None
    list_items = df['Pattern'].tolist()
    for item in list_items:
        if item in value:
            out = item
            break
    return out

它將使用數據框作為查找表和參數返回新

以下完整示例將顯示預期的數據幀。

import pandas as pd

df = pd.DataFrame([[1, 'Gas Station'], [2, 'Servicenter'], [5, 'Bakery good bread'], [58, 'Fresh market MIA'], [76, 'Auto Liberty aa1121']], columns=['A','B'])
categories = pd.DataFrame([['Gasoline', 'Gas Station'], ['Gasoline', 'Servicenter'], ['Food', 'Bakery'],  ['Food', 'Fresh market'], ['Insurance', 'Auto Liberty']], columns=['Category','Pattern'])

def lookup_table(value, df):
    """

    :param value: value to find the dataframe
    :param df: dataframe which constains the lookup table
    :return: 
        A String representing a the data found
    """
    # Variable Initialization for non found entry in list
    out = None
    list_items = df['Pattern'].tolist()
    for item in list_items:
        if item in value:
            out = item
            break
    return out


df['Pattern'] = df['B'].apply(lambda x: lookup_table(x, categories))
final = pd.merge(df, categories)

預期產量

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM