简体   繁体   中英

Pandas: Create New Column using Values in Subgroup of Other Columns

I have a dataframe with transactions. The index is the date of the transaction (timestamp), and the columns are the price (float), city (string), and product name (string). I want to add a new column to the dataframe containing the minimum price for each product in each city. So the fourth column will have the same value for every row where the city and product are the same.

Here's example code:

# dictionary of transactions

d = {'1': ['20', 'NYC', 'Widget A'], '2': ['30', 'NYC', 'Widget A'], '3': ['5', 'NYC', 'Widget A'], \
     '4': ['300', 'LA', 'Widget B'], '5': ['30', 'LA', 'Widget B'],  '6': ['100', 'LA', 'Widget A']}

columns=['Price', 'City', 'Product']

# create dataframe and rename columns

df = pd.DataFrame.from_dict(data=d, orient='index')
df.columns = columns

This produces a dataframe that looks like this

Price   City    Product

1   20  NYC Widget A

2   30  NYC Widget A

3   5   NYC Widget A

4   300 LA  Widget B

5   30  LA  Widget B

6   100 LA  Widget A

So I would want to add a new columns with the minimum price for each city/product subgroup. So rows 1-3 (all NYC/Widget A) would be 5 (the min price which is in row 3), Rows 4 and 5 would have a value of 30 (both LA/Widget B), and Row 6 would have a value of 100.

Starting from a sample dataframe product.csv like this:

date,price,city,product
2015-09-21,1.5,c1,p1
2015-09-21,1.2,c1,p1
2015-09-21,0.5,c1,p2
2015-09-21,0.3,c1,p2
2015-09-22,0.6,c2,p2
2015-09-22,1.2,c2,p2

I would do in this way:

# Read Dataframe
df = pd.read_csv('product.csv')

Then I'm adding the desired column with:

df['minprice'] = df.groupby(['city','product'])['price'].transform(min)

which returns:

         date  price city product  minprice
0  2015-09-21    1.5   c1      p1       1.2
1  2015-09-21    1.2   c1      p1       1.2
2  2015-09-21    0.5   c1      p2       0.3
3  2015-09-21    0.3   c1      p2       0.3
4  2015-09-22    0.6   c2      p2       0.6
5  2015-09-22    1.2   c2      p2       0.6

Hope that helps.

You need to apply transform to the groupby , which preserves the shape of your original DataFrame.

import pandas as pd
import numpy as np

df = pd.DataFrame({'price': np.round(np.random.random(15), 2), 
                   'product': list('ABC') * 5, 
                   'city': ['San Francisco'] * 10 + ['New York'] * 5}

df['min_city_product_price'] = df.groupby(['city', 'product']).price.transform(min)

>>> df
             city  price product  min_city_product_price
0   San Francisco   0.65       A                    0.35
1   San Francisco   0.97       B                    0.28
2   San Francisco   0.09       C                    0.09
3   San Francisco   0.35       A                    0.35
4   San Francisco   0.28       B                    0.28
5   San Francisco   0.84       C                    0.09
6   San Francisco   0.49       A                    0.35
7   San Francisco   0.94       B                    0.28
8   San Francisco   0.13       C                    0.09
9   San Francisco   0.89       A                    0.35
10       New York   0.75       B                    0.30
11       New York   0.31       C                    0.31
12       New York   0.22       A                    0.22
13       New York   0.30       B                    0.30
14       New York   0.56       C                    0.31

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM