Pandas: Create New Column using Values in Subgroup of Other Columns

Question

I have a dataframe with transactions. The index is the date of the transaction (timestamp), and the columns are the price (float), city (string), and product name (string). I want to add a new column to the dataframe containing the minimum price for each product in each city. So the fourth column will have the same value for every row where the city and product are the same.

Here's example code:

# dictionary of transactions

d = {'1': ['20', 'NYC', 'Widget A'], '2': ['30', 'NYC', 'Widget A'], '3': ['5', 'NYC', 'Widget A'], \
     '4': ['300', 'LA', 'Widget B'], '5': ['30', 'LA', 'Widget B'],  '6': ['100', 'LA', 'Widget A']}

columns=['Price', 'City', 'Product']

# create dataframe and rename columns

df = pd.DataFrame.from_dict(data=d, orient='index')
df.columns = columns

This produces a dataframe that looks like this

Price   City    Product

1   20  NYC Widget A

2   30  NYC Widget A

3   5   NYC Widget A

4   300 LA  Widget B

5   30  LA  Widget B

6   100 LA  Widget A

So I would want to add a new columns with the minimum price for each city/product subgroup. So rows 1-3 (all NYC/Widget A) would be 5 (the min price which is in row 3), Rows 4 and 5 would have a value of 30 (both LA/Widget B), and Row 6 would have a value of 100.

Answer 1

Starting from a sample dataframe product.csv like this:

date,price,city,product
2015-09-21,1.5,c1,p1
2015-09-21,1.2,c1,p1
2015-09-21,0.5,c1,p2
2015-09-21,0.3,c1,p2
2015-09-22,0.6,c2,p2
2015-09-22,1.2,c2,p2

I would do in this way:

# Read Dataframe
df = pd.read_csv('product.csv')

Then I'm adding the desired column with:

df['minprice'] = df.groupby(['city','product'])['price'].transform(min)

which returns:

         date  price city product  minprice
0  2015-09-21    1.5   c1      p1       1.2
1  2015-09-21    1.2   c1      p1       1.2
2  2015-09-21    0.5   c1      p2       0.3
3  2015-09-21    0.3   c1      p2       0.3
4  2015-09-22    0.6   c2      p2       0.6
5  2015-09-22    1.2   c2      p2       0.6

Hope that helps.

Answer 2

You need to apply transform to the groupby , which preserves the shape of your original DataFrame.

import pandas as pd
import numpy as np

df = pd.DataFrame({'price': np.round(np.random.random(15), 2), 
                   'product': list('ABC') * 5, 
                   'city': ['San Francisco'] * 10 + ['New York'] * 5}

df['min_city_product_price'] = df.groupby(['city', 'product']).price.transform(min)

>>> df
             city  price product  min_city_product_price
0   San Francisco   0.65       A                    0.35
1   San Francisco   0.97       B                    0.28
2   San Francisco   0.09       C                    0.09
3   San Francisco   0.35       A                    0.35
4   San Francisco   0.28       B                    0.28
5   San Francisco   0.84       C                    0.09
6   San Francisco   0.49       A                    0.35
7   San Francisco   0.94       B                    0.28
8   San Francisco   0.13       C                    0.09
9   San Francisco   0.89       A                    0.35
10       New York   0.75       B                    0.30
11       New York   0.31       C                    0.31
12       New York   0.22       A                    0.22
13       New York   0.30       B                    0.30
14       New York   0.56       C                    0.31

Pandas: Create New Column using Values in Subgroup of Other Columns

Question

2 answers

solution1
4 ACCPTED 2015-09-21 14:29:29

solution2
3 2015-09-21 14:27:56

Pandas: Create New Column using Values in Subgroup of Other Columns

Question

2 answers

solution1 4 ACCPTED 2015-09-21 14:29:29

solution2 3 2015-09-21 14:27:56

solution1
4 ACCPTED 2015-09-21 14:29:29

solution2
3 2015-09-21 14:27:56