繁体   English   中英

如何使用一列作为模式合并两个熊猫数据框并包括左数据框的列?

[英]How to merge two pandas dataframes using a column as pattern and include columns of the left dataframe?

拥有以下Python代码,试图使用pd.merge,但似乎关键列必须相同。 尝试使用df.B中带有category.Pattern的“ like”运算符连接到类似于SQL的内容。

使用更好的数据示例进行更新

import pandas as pd
import numpy as np
df = pd.DataFrame([[1, 'Gas Station'], [2, 'Servicenter'], [5, 'Bakery good bread'], [58, 'Fresh market MIA'], [76, 'Auto Liberty aa1121']], columns=['A','B'])

    Out[12]:
    A   B
0   1   Gas Station
1   2   Servicenter
2   5   Bakery good bread
3   58  Fresh market MIA
4   76  Auto Liberty aa1121

categories = pd.DataFrame([['Gasoline', 'Gas Station'], ['Gasoline', 'Servicenter'], ['Food', 'Bakery'],  ['Food', 'Fresh market'], ['Insurance', 'Auto Liberty']], columns=['Category','Pattern'])

    Out[13]:
    Category    Pattern
0   Gasoline    Gas Station
1   Gasoline    Servicenter
2   Food    Bakery
3   Food    Fresh market
4   Insurance   Auto Liberty

预期结果是:

    Out[14]:
    A   B                   Category
0   1   Gas Station         Gasoline
1   2   Servicenter         Gasoline
2   5   Bakery good bread   Food
3   58  Fresh market MIA    Food
4   58  Auto Liberty aa1121 Insurance

感谢您的建议/反馈。

df['lower'] = df['B'].str.extract(r'([A-z0-9]+)')
categories['lower'] = categories['pattern'].str.extract(r'([A-z0-9]+)')
final = pd.merge(df, categories)

通过创建类似以下的新功能:

def lookup_table(value, df):
    """

    :param value: value to find the dataframe
    :param df: dataframe which constains the lookup table
    :return: 
        A String representing a the data found
    """
    # Variable Initialization for non found entry in list
    out = None
    list_items = df['Pattern'].tolist()
    for item in list_items:
        if item in value:
            out = item
            break
    return out

它将使用数据框作为查找表和参数返回新

以下完整示例将显示预期的数据帧。

import pandas as pd

df = pd.DataFrame([[1, 'Gas Station'], [2, 'Servicenter'], [5, 'Bakery good bread'], [58, 'Fresh market MIA'], [76, 'Auto Liberty aa1121']], columns=['A','B'])
categories = pd.DataFrame([['Gasoline', 'Gas Station'], ['Gasoline', 'Servicenter'], ['Food', 'Bakery'],  ['Food', 'Fresh market'], ['Insurance', 'Auto Liberty']], columns=['Category','Pattern'])

def lookup_table(value, df):
    """

    :param value: value to find the dataframe
    :param df: dataframe which constains the lookup table
    :return: 
        A String representing a the data found
    """
    # Variable Initialization for non found entry in list
    out = None
    list_items = df['Pattern'].tolist()
    for item in list_items:
        if item in value:
            out = item
            break
    return out


df['Pattern'] = df['B'].apply(lambda x: lookup_table(x, categories))
final = pd.merge(df, categories)

预期产量

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM