简体   繁体   中英

my groupby() does not work in pandas dataframe

I have to make a group by very simple but it does not work in my case. I can not reproduce the actual data but suppose that my DF is:

              Cod          Cost        Date                 VAL
0            A123          123         2017-12-21           0.0
1            A123          123         2017-12-21          -2.0
2            A123          123         2017-12-21         -10.0
3            FB00          180         2016-12-11          80.0
4            FB00          180         2016-12-11          80.0    

I have to make a groupby for Cod, Cost and Date and get the corresponding minimum of VAL with reset_index(). My code is:

DF = DF.groupby(['Cod', 'Cost','Date'])['VAL'].min().reset_index()

the expected is:

              Cod          Cost        Date                 VAL
0            A123          123         2017-12-21         -10.0
1            FB00          180         2016-12-11          80.0

but it return:

              Cod          Cost        Date                 VAL
0            A123          123         2017-12-21           0.0
1            A123          123         2017-12-21           0.0
2            A123          123         2017-12-21          80.0
3            FB00          180         2016-12-11          80.0
4            FB00          180         2016-12-11          80.0 

I just simulated your sample and its works for me:

Sample Example:

>>> df
    Cod  Cost        Date   VAL
0  A123   123  2017-12-21   0.0
1  A123   123  2017-12-21  -2.0
2  A123   123  2017-12-21 -10.0
3  FB00   180  2016-12-11  80.0
4  FB00   180  2016-12-11  80.0

Result:

>>> df.groupby(['Cod', 'Cost','Date'])['VAL'].min().reset_index()                                                                                                            Cod  Cost        Date   VAL
0  A123   123  2017-12-21 -10.0
1  FB00   180  2016-12-11  80.0

Applied on pandas: 0.23.3

Alternate:

You can also apply like below:

>>> df.groupby(['Cod', 'Cost','Date']).min().reset_index()
    Cod  Cost        Date   VAL
0  A123   123  2017-12-21 -10.0
1  FB00   180  2016-12-11  80.0

It should also work with 0.18.1 :

Python 3.5.1 (default, May 25 2016, 07:19:16)
[GCC 4.8.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.18.1'

Sample

>>> df
    Cod  Cost        Date   VAL
0  A123   123  2017-12-21   0.0
1  A123   123  2017-12-21  -2.0
2  A123   123  2017-12-21 -10.0
3  FB00   180  2016-12-11  80.0
4  FB00   180  2016-12-11  80.0
>>> df.groupby(['Cod', 'Cost','Date']).min().reset_index()
    Cod  Cost        Date   VAL
0  A123   123  2017-12-21 -10.0
1  FB00   180  2016-12-11  80.0

OR

>>> df.groupby(['Cod', 'Cost','Date'])['VAL'].min().reset_index()
    Cod  Cost        Date   VAL
0  A123   123  2017-12-21 -10.0
1  FB00   180  2016-12-11  80.0

Like other contributors on here, you're code yields the expected results for me, but since this isn't the case for you allow me to propose another, though far less efficient, solution.

Starting by replicating your dataframe

df = pd.DataFrame([['A123', 123, '2017-12-21', 0], ['A123', 123, '2017-12-21', -2], ['A123', 123, '2017-12-21', -10], ['FB00', 180, '2016-12-11', 80], ['FB00', 180, '2016-12-11', 80]], columns=['Cod', 'Cost', 'Date', 'VAL'])

We can then sort the values by Cod, Cost, Date and VAL

df.sort_values(['Cod', 'Cost', 'Date', 'VAL'], ascending=True, inplace=True)

Output:

    Cod     Cost   Date     VAL
2   A123    123 2017-12-21  -10
1   A123    123 2017-12-21  -2
0   A123    123 2017-12-21  0
3   FB00    180 2016-12-11  80
4   FB00    180 2016-12-11  80

We can then drop duplicate Cod, Cost and Dates, keeping the first record (the lowest value), leaving us with your expected output.

df.drop_duplicates(['Cod', 'Cost', 'Date'], keep='first', inplace=True)
df.reset_index(drop=True, inplace=True)

With an output of:

    Cod     Cost    Date    VAL
0   A123    123 2017-12-21  -10
1   FB00    180 2016-12-11  80

Very hacky, not recommended, but should get you the result you expect.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM