简体   繁体   中英

Cannot get groupby records based on their minimum value using pandas in python

I have the following csv

id;price;editor
k1;10,00;ed1
k1;8,00;ed2
k3;10,00;ed1
k3;11,00;ed2
k2;10,50;ed1
k1;9,50;ed3

If I do the following

import pandas as pd 

df = pd.read_csv('Testing.csv', delimiter =';')
df_reduced= df.groupby(['id', 'editor'])['price'].min()

Instead of getting

k1;8,00;ed2
k2;10,50;ed1
k3;10,00;ed1

I get

k1;10,00;ed1
    8,00;ed2
    9,50;ed3
k2;10,50;ed1
k3;10,00;ed1
   11,00;ed2 

So can I get three id's with their minimum values?

Group the data by only id and find min price for each group. Index the original dataframe based on the minimum values to include the editor column.

Note: I am assuming that the comma in price column is a typo

df.loc[df['price'] == df.groupby('id')['price'].transform('min')]


    id  price   editor
1   k1  8.0     ed2 
2   k3  10.0    ed1 
4   k2  10.5    ed1 

drop_duplicate + sort_values

#df['price'] = pd.to_numeric(df['price'].str.replace(",", "."))

df.sort_values('price').drop_duplicates(['id'])
Out[423]: 
   id  price editor
1  k1    8.0    ed2
2  k3   10.0    ed1
4  k2   10.5    ed1

Much like @Wen-Ben I choose to use sort_values and drop_duplicates , however, I converted the values using pd.read_csv with the decimal parameter.

from io import StringIO

csvfile = StringIO("""id;price;editor
k1;10,00;ed1
k1;8,00;ed2
k3;10,00;ed1
k3;11,00;ed2
k2;10,50;ed1
k1;9,50;ed3""")

df = pd.read_csv(csvfile, delimiter =';', decimal=',')

df.sort_values(['id','price']).drop_duplicates(['id']) 

Output:

   id  price editor
1  k1    8.0    ed2
4  k2   10.5    ed1
2  k3   10.0    ed1

The instruction

df_reduced= df.groupby(['id', 'editor'])['price'].min()

will give you the min price per each unique id-editor pair, you want the min per id. However, since your price field has a string format, you first need to cast it to numeric in order to run the groupby:

df['price'] = pd.to_numeric(df1['price'].str.replace(",", "."))
df.loc[df.groupby('id')['price'].idxmin()]

Output

   id  price editor
1  k1    8.0    ed2
4  k2   10.5    ed1
2  k3   10.0    ed1

get rid of the editor part:

df_reduced= df.groupby(['id'])['price'].min()

no need to include 'transformed' as somebody else stated

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM