I have a dataframe with transactions. The index is the date of the transaction (timestamp), and the columns are the price (float), city (string), and product name (string). I want to add a new column to the dataframe containing the minimum price for each product in each city. So the fourth column will have the same value for every row where the city and product are the same.
Here's example code:
# dictionary of transactions
d = {'1': ['20', 'NYC', 'Widget A'], '2': ['30', 'NYC', 'Widget A'], '3': ['5', 'NYC', 'Widget A'], \
'4': ['300', 'LA', 'Widget B'], '5': ['30', 'LA', 'Widget B'], '6': ['100', 'LA', 'Widget A']}
columns=['Price', 'City', 'Product']
# create dataframe and rename columns
df = pd.DataFrame.from_dict(data=d, orient='index')
df.columns = columns
This produces a dataframe that looks like this
Price City Product
1 20 NYC Widget A
2 30 NYC Widget A
3 5 NYC Widget A
4 300 LA Widget B
5 30 LA Widget B
6 100 LA Widget A
So I would want to add a new columns with the minimum price for each city/product subgroup. So rows 1-3 (all NYC/Widget A) would be 5 (the min price which is in row 3), Rows 4 and 5 would have a value of 30 (both LA/Widget B), and Row 6 would have a value of 100.
Starting from a sample dataframe product.csv
like this:
date,price,city,product
2015-09-21,1.5,c1,p1
2015-09-21,1.2,c1,p1
2015-09-21,0.5,c1,p2
2015-09-21,0.3,c1,p2
2015-09-22,0.6,c2,p2
2015-09-22,1.2,c2,p2
I would do in this way:
# Read Dataframe
df = pd.read_csv('product.csv')
Then I'm adding the desired column with:
df['minprice'] = df.groupby(['city','product'])['price'].transform(min)
which returns:
date price city product minprice
0 2015-09-21 1.5 c1 p1 1.2
1 2015-09-21 1.2 c1 p1 1.2
2 2015-09-21 0.5 c1 p2 0.3
3 2015-09-21 0.3 c1 p2 0.3
4 2015-09-22 0.6 c2 p2 0.6
5 2015-09-22 1.2 c2 p2 0.6
Hope that helps.
You need to apply transform
to the groupby
, which preserves the shape of your original DataFrame.
import pandas as pd
import numpy as np
df = pd.DataFrame({'price': np.round(np.random.random(15), 2),
'product': list('ABC') * 5,
'city': ['San Francisco'] * 10 + ['New York'] * 5}
df['min_city_product_price'] = df.groupby(['city', 'product']).price.transform(min)
>>> df
city price product min_city_product_price
0 San Francisco 0.65 A 0.35
1 San Francisco 0.97 B 0.28
2 San Francisco 0.09 C 0.09
3 San Francisco 0.35 A 0.35
4 San Francisco 0.28 B 0.28
5 San Francisco 0.84 C 0.09
6 San Francisco 0.49 A 0.35
7 San Francisco 0.94 B 0.28
8 San Francisco 0.13 C 0.09
9 San Francisco 0.89 A 0.35
10 New York 0.75 B 0.30
11 New York 0.31 C 0.31
12 New York 0.22 A 0.22
13 New York 0.30 B 0.30
14 New York 0.56 C 0.31
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.