[英]how to join two dataframes for which column values are within a certain range for multiple columns using pandas dataframe?
[英]How to merge two pandas dataframes using a column as pattern and include columns of the left dataframe?
拥有以下Python代码,试图使用pd.merge,但似乎关键列必须相同。 尝试使用df.B中带有category.Pattern的“ like”运算符连接到类似于SQL的内容。
使用更好的数据示例进行更新 。
import pandas as pd
import numpy as np
df = pd.DataFrame([[1, 'Gas Station'], [2, 'Servicenter'], [5, 'Bakery good bread'], [58, 'Fresh market MIA'], [76, 'Auto Liberty aa1121']], columns=['A','B'])
Out[12]:
A B
0 1 Gas Station
1 2 Servicenter
2 5 Bakery good bread
3 58 Fresh market MIA
4 76 Auto Liberty aa1121
categories = pd.DataFrame([['Gasoline', 'Gas Station'], ['Gasoline', 'Servicenter'], ['Food', 'Bakery'], ['Food', 'Fresh market'], ['Insurance', 'Auto Liberty']], columns=['Category','Pattern'])
Out[13]:
Category Pattern
0 Gasoline Gas Station
1 Gasoline Servicenter
2 Food Bakery
3 Food Fresh market
4 Insurance Auto Liberty
预期结果是:
Out[14]:
A B Category
0 1 Gas Station Gasoline
1 2 Servicenter Gasoline
2 5 Bakery good bread Food
3 58 Fresh market MIA Food
4 58 Auto Liberty aa1121 Insurance
感谢您的建议/反馈。
df['lower'] = df['B'].str.extract(r'([A-z0-9]+)')
categories['lower'] = categories['pattern'].str.extract(r'([A-z0-9]+)')
final = pd.merge(df, categories)
通过创建类似以下的新功能:
def lookup_table(value, df):
"""
:param value: value to find the dataframe
:param df: dataframe which constains the lookup table
:return:
A String representing a the data found
"""
# Variable Initialization for non found entry in list
out = None
list_items = df['Pattern'].tolist()
for item in list_items:
if item in value:
out = item
break
return out
它将使用数据框作为查找表和参数值返回新值
以下完整示例将显示预期的数据帧。
import pandas as pd
df = pd.DataFrame([[1, 'Gas Station'], [2, 'Servicenter'], [5, 'Bakery good bread'], [58, 'Fresh market MIA'], [76, 'Auto Liberty aa1121']], columns=['A','B'])
categories = pd.DataFrame([['Gasoline', 'Gas Station'], ['Gasoline', 'Servicenter'], ['Food', 'Bakery'], ['Food', 'Fresh market'], ['Insurance', 'Auto Liberty']], columns=['Category','Pattern'])
def lookup_table(value, df):
"""
:param value: value to find the dataframe
:param df: dataframe which constains the lookup table
:return:
A String representing a the data found
"""
# Variable Initialization for non found entry in list
out = None
list_items = df['Pattern'].tolist()
for item in list_items:
if item in value:
out = item
break
return out
df['Pattern'] = df['B'].apply(lambda x: lookup_table(x, categories))
final = pd.merge(df, categories)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.