I have the following dataframes:
Dataframe 1:
Fruit | Vegetable |
---|---|
Mango | Spinach |
Apple | Kale |
Watermelon | Squash |
Peach | Zucchini |
Dataframe 2:
Item | Price/lb |
---|---|
Mango | 2 |
Spinach | 1 |
Apple | 4 |
Peach | 2 |
Zucchini | 1 |
I want to discard the rows from the dataframe 1 when both the columns are not present in the 'Item' series of dataframe 2 and I want to create the following dataframe3 based on dataframes 1 & 2:
Fruit | Vegetable | Combination Price |
---|---|---|
Mango | Spinach | 3 |
Peach | Zucchini | 3 |
The third column in dataframe 3 is the sum of the item prices from dataframe 2.
You can do this in two steps:
Mask your dataframe1 such that it only contains rows where both fruit and vegetable exits in dataframe2.Item
Use Series.map
to obtain the values associated with the remaining rows, and add them together to get the combination price.
# Make our df2 information easier to work with.
# It is now a Series whose index is the Item and values are the prices.
# This allows us to work with it like a dictionary
>>> item_pricing = df2.set_index("Item")["Price/lb"]
>>> items = item_pricing.index
# get rows where BOTH fruit is in items & Vegetable is in items
>>> mask = df1["Fruit"].isin(items) & df1["Vegetable"].isin(items)
>>> subset = df1.loc[mask].copy() # .copy() tells pandas we want this subset to be independent of the larger dataframe
>>> print(subset)
Fruit Vegetable
0 Mango Spinach
3 Peach Zucchini
# On each column (fruit and vegetable) use .map to obtain the price of those items
# then sum those columns together into a single price
>>> subset["combo_price"] = subset.apply(lambda s: s.map(item_pricing)).sum(axis=1)
>>> print(subset)
Fruit Vegetable combo_price
0 Mango Spinach 3
3 Peach Zucchini 3
All together with no comments:
item_pricing = df2.set_index("Item")["Price/lb"]
items = item_pricing.index
mask = df1["Fruit"].isin(items) & df1["Vegetable"].isin(items)
subset = df1.loc[mask].copy()
subset["combo_price"] = subset.apply(lambda s: s.map(item_pricing)).sum(axis=1)
You can combine stack()
- map()
with unstack()
- sum()
:
df3 = (df1.join(df1
.stack().map(df2.set_index('Item')['Price/lb'])
.unstack().sum(axis=1, min_count=2).rename('Combination Price')
).dropna())
# Fruit Vegetable Combination Price
# 0 Mango Spinach 3.0
# 3 Peach Zucchini 3.0
Stack df1
so we can map the prices all at once:
stacked = df1.stack().map(df2.set_index('Item')['Price/lb'])
# 0 Fruit 2.0
# Vegetable 1.0
# 1 Fruit 4.0
# Vegetable NaN
# 2 Fruit NaN
# Vegetable NaN
# 3 Fruit 2.0
# Vegetable 1.0
# dtype: float64
Unstack back to original shape:
unstacked = stacked.unstack()
# Fruit Vegetable
# 0 2.0 1.0
# 1 4.0 NaN
# 2 NaN NaN
# 3 2.0 1.0
Sum with min_count=2
which means the sum will be nan
unless 2 values exist (both Fruit and Vegetable)
combo = unstacked.sum(axis=1, min_count=2)
# 0 3.0
# 1 NaN
# 2 NaN
# 3 3.0
# dtype: float64
Rejoin with df1
and drop the nan
rows:
df3 = df1.join(combo.rename('Combination Price')).dropna()
# Fruit Vegetable Combination Price
# 0 Mango Spinach 3.0
# 3 Peach Zucchini 3.0
A combination of melt
, merge
and unstack
:
(df1[(df1['Fruit'].isin(df2['Item'])) & (df1['Vegetable'].isin(df2['Item']))]
.reset_index()
.melt(id_vars = 'index',value_vars = ['Fruit','Vegetable'])
.merge(df2,left_on='value',right_on = 'Item')
.drop(columns = 'Item')
.set_index(['index','variable']).unstack(level = 1)
.transform(lambda g: g.assign(Combination_Price=g.xs('Price/lb',axis=1,level=0).sum(axis=1)))
)
produces combination price and breakdown by ingredient, just in case
value Price/lb Combination_Price
variable Fruit Vegetable Fruit Vegetable
index
0 Mango Spinach 2 1 3
3 Peach Zucchini 2 1 3
You can do this with two inner joins, as shown below. The final result is contained in df3.
df_temp = pd.merge(df1, df2, left_on='Fruit', right_on='Item', how='inner')
df3 = pd.merge(df_temp, df2, left_on='Vegetable', right_on='Item', how='inner')
df3['Combined price'] = df3['Price/lb_x'] + df3['Price/lb_y']
df3.drop(columns = ['Item_x','Price/lb_x','Item_y','Price/lb_y'], inplace = True)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.