[英]how to join two dataframes for which column values are within a certain range for multiple columns using pandas dataframe?
[英]How to merge two pandas dataframes using a column as pattern and include columns of the left dataframe?
擁有以下Python代碼,試圖使用pd.merge,但似乎關鍵列必須相同。 嘗試使用df.B中帶有category.Pattern的“ like”運算符連接到類似於SQL的內容。
使用更好的數據示例進行更新 。
import pandas as pd
import numpy as np
df = pd.DataFrame([[1, 'Gas Station'], [2, 'Servicenter'], [5, 'Bakery good bread'], [58, 'Fresh market MIA'], [76, 'Auto Liberty aa1121']], columns=['A','B'])
Out[12]:
A B
0 1 Gas Station
1 2 Servicenter
2 5 Bakery good bread
3 58 Fresh market MIA
4 76 Auto Liberty aa1121
categories = pd.DataFrame([['Gasoline', 'Gas Station'], ['Gasoline', 'Servicenter'], ['Food', 'Bakery'], ['Food', 'Fresh market'], ['Insurance', 'Auto Liberty']], columns=['Category','Pattern'])
Out[13]:
Category Pattern
0 Gasoline Gas Station
1 Gasoline Servicenter
2 Food Bakery
3 Food Fresh market
4 Insurance Auto Liberty
預期結果是:
Out[14]:
A B Category
0 1 Gas Station Gasoline
1 2 Servicenter Gasoline
2 5 Bakery good bread Food
3 58 Fresh market MIA Food
4 58 Auto Liberty aa1121 Insurance
感謝您的建議/反饋。
df['lower'] = df['B'].str.extract(r'([A-z0-9]+)')
categories['lower'] = categories['pattern'].str.extract(r'([A-z0-9]+)')
final = pd.merge(df, categories)
通過創建類似以下的新功能:
def lookup_table(value, df):
"""
:param value: value to find the dataframe
:param df: dataframe which constains the lookup table
:return:
A String representing a the data found
"""
# Variable Initialization for non found entry in list
out = None
list_items = df['Pattern'].tolist()
for item in list_items:
if item in value:
out = item
break
return out
它將使用數據框作為查找表和參數值返回新值
以下完整示例將顯示預期的數據幀。
import pandas as pd
df = pd.DataFrame([[1, 'Gas Station'], [2, 'Servicenter'], [5, 'Bakery good bread'], [58, 'Fresh market MIA'], [76, 'Auto Liberty aa1121']], columns=['A','B'])
categories = pd.DataFrame([['Gasoline', 'Gas Station'], ['Gasoline', 'Servicenter'], ['Food', 'Bakery'], ['Food', 'Fresh market'], ['Insurance', 'Auto Liberty']], columns=['Category','Pattern'])
def lookup_table(value, df):
"""
:param value: value to find the dataframe
:param df: dataframe which constains the lookup table
:return:
A String representing a the data found
"""
# Variable Initialization for non found entry in list
out = None
list_items = df['Pattern'].tolist()
for item in list_items:
if item in value:
out = item
break
return out
df['Pattern'] = df['B'].apply(lambda x: lookup_table(x, categories))
final = pd.merge(df, categories)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.