How to filter the rows of a dataframe based on the presence of the column values in a separate dataframe and append columns from the second dataframe

Question

I have the following dataframes:

Dataframe 1:

Fruit	Vegetable
Mango	Spinach
Apple	Kale
Watermelon	Squash
Peach	Zucchini

Dataframe 2:

Item	Price/lb
Mango	2
Spinach	1
Apple	4
Peach	2
Zucchini	1

I want to discard the rows from the dataframe 1 when both the columns are not present in the 'Item' series of dataframe 2 and I want to create the following dataframe3 based on dataframes 1 & 2:

Fruit	Vegetable	Combination Price
Mango	Spinach	3
Peach	Zucchini	3

The third column in dataframe 3 is the sum of the item prices from dataframe 2.

Answer 1

You can do this in two steps:

Mask your dataframe1 such that it only contains rows where both fruit and vegetable exits in dataframe2.Item
Use Series.map to obtain the values associated with the remaining rows, and add them together to get the combination price.

# Make our df2 information easier to work with. 
#  It is now a Series whose index is the Item and values are the prices. 
#  This allows us to work with it like a dictionary
>>> item_pricing = df2.set_index("Item")["Price/lb"]
>>> items = item_pricing.index

# get rows where BOTH fruit is in items & Vegetable is in items
>>> mask = df1["Fruit"].isin(items) & df1["Vegetable"].isin(items)
>>> subset = df1.loc[mask].copy()  # .copy() tells pandas we want this subset to be independent of the larger dataframe
>>> print(subset)
   Fruit Vegetable
0  Mango   Spinach
3  Peach  Zucchini

# On each column (fruit and vegetable) use .map to obtain the price of those items
#  then sum those columns together into a single price
>>> subset["combo_price"] = subset.apply(lambda s: s.map(item_pricing)).sum(axis=1)
>>> print(subset)
   Fruit Vegetable  combo_price
0  Mango   Spinach            3
3  Peach  Zucchini            3

All together with no comments:

item_pricing = df2.set_index("Item")["Price/lb"]
items = item_pricing.index

mask = df1["Fruit"].isin(items) & df1["Vegetable"].isin(items)
subset = df1.loc[mask].copy()
subset["combo_price"] = subset.apply(lambda s: s.map(item_pricing)).sum(axis=1)

Answer 2

You can combine stack() - map() with unstack() - sum() :

df3 = (df1.join(df1
    .stack().map(df2.set_index('Item')['Price/lb'])
    .unstack().sum(axis=1, min_count=2).rename('Combination Price')
).dropna())

#    Fruit Vegetable  Combination Price
# 0  Mango   Spinach                3.0
# 3  Peach  Zucchini                3.0

Step-by-step explanation

Stack df1 so we can map the prices all at once:

stacked = df1.stack().map(df2.set_index('Item')['Price/lb'])

# 0  Fruit        2.0
#    Vegetable    1.0
# 1  Fruit        4.0
#    Vegetable    NaN
# 2  Fruit        NaN
#    Vegetable    NaN
# 3  Fruit        2.0
#    Vegetable    1.0
# dtype: float64

Unstack back to original shape:

unstacked = stacked.unstack()

#    Fruit  Vegetable
# 0    2.0        1.0
# 1    4.0        NaN
# 2    NaN        NaN
# 3    2.0        1.0

Sum with min_count=2 which means the sum will be nan unless 2 values exist (both Fruit and Vegetable)

combo = unstacked.sum(axis=1, min_count=2)

# 0    3.0
# 1    NaN
# 2    NaN
# 3    3.0
# dtype: float64

Rejoin with df1 and drop the nan rows:

df3 = df1.join(combo.rename('Combination Price')).dropna()

#    Fruit Vegetable  Combination Price
# 0  Mango   Spinach                3.0
# 3  Peach  Zucchini                3.0

Answer 3

A combination of melt , merge and unstack :

(df1[(df1['Fruit'].isin(df2['Item'])) & (df1['Vegetable'].isin(df2['Item']))]
    .reset_index()
    .melt(id_vars = 'index',value_vars = ['Fruit','Vegetable'])
    .merge(df2,left_on='value',right_on = 'Item')
    .drop(columns = 'Item')
    .set_index(['index','variable']).unstack(level = 1)
    .transform(lambda g: g.assign(Combination_Price=g.xs('Price/lb',axis=1,level=0).sum(axis=1)))
)

produces combination price and breakdown by ingredient, just in case

            value               Price/lb           Combination_Price
variable    Fruit   Vegetable   Fruit   Vegetable   
index                   
0           Mango   Spinach     2       1         3
3           Peach   Zucchini    2       1         3

Answer 4

You can do this with two inner joins, as shown below. The final result is contained in df3.

df_temp = pd.merge(df1, df2, left_on='Fruit', right_on='Item', how='inner')
df3 = pd.merge(df_temp, df2, left_on='Vegetable', right_on='Item', how='inner')
df3['Combined price'] = df3['Price/lb_x'] + df3['Price/lb_y']
df3.drop(columns = ['Item_x','Price/lb_x','Item_y','Price/lb_y'], inplace = True)

How to filter the rows of a dataframe based on the presence of the column values in a separate dataframe and append columns from the second dataframe

Question

4 answers

solution1
1 2021-04-01 21:18:19

solution2
1 2021-04-01 21:34:09

Step-by-step explanation

solution3
1 ACCPTED 2021-04-01 21:50:10

solution4
1 2021-04-01 23:06:55

How to filter the rows of a dataframe based on the presence of the column values in a separate dataframe and append columns from the second dataframe

Question

4 answers

solution1 1 2021-04-01 21:18:19

solution2 1 2021-04-01 21:34:09

Step-by-step explanation

solution3 1 ACCPTED 2021-04-01 21:50:10

solution4 1 2021-04-01 23:06:55

solution1
1 2021-04-01 21:18:19

solution2
1 2021-04-01 21:34:09

solution3
1 ACCPTED 2021-04-01 21:50:10

solution4
1 2021-04-01 23:06:55