简体   繁体   中英

Pandas groupby - dataframe's column disappearing

I have the following data frame called "new_df":

dato    uttak   annlegg Merd    ID  Leng    BW  CF  F   B   H   K
0   2020-12-15  12_20   LL  3   1   48.0    1200    1.085069    0.0 2.0 0.0 NaN
1   2020-12-15  12_20   LL  3   2   43.0    830 1.043933    0.0 1.0 0.0 NaN

columns are:

'dato', 'uttak', 'annlegg', 'Merd', 'ID', 'Leng', 'BW', 'CF', 'F', 'B', 'H', 'K'

when I do:

new_df.groupby(['annlegg','Merd'],as_index=False).mean()

I got all means except the column "BW" like this:

annlegg Merd   ID        Leng         CF           F       B               H        K
0   KH  1   42.557143   56.398649   1.265812    0.071770    1.010638    0.600000    0.127907
1   KH  2   42.683794   56.492228   1.270522    0.021978    0.739130    0.230769    0.075862
2   KH  3   42.177866   35.490119   1.125416    0.000000    0.384146    0.333333    0.034483

Column "BW" just disappeared when I groupby, no matter "as_index" True or False, why is that?

It appears the content as the BW column does not have a numerical type but an object type instead, which is used for storing strings for instance. Thus when applying groupby and mean aggregation function, tour column disappears has computing the mean value of an object (think of a string does not make sense in general ).

You should start by converting your BW column:

First method: pd.to_numeric


This first method will safely convert all your column to float objects.

new_df['BW'] = pd.to_numeric(new_df['BW'])

Second method: df.astype


If you do not want to convert your data to float (for instance, you know that this column only contains int, or if floating point precision does not interest you), you can use the astype method which allows you to convert to almost any type you want:

new_df['BW'] = new_df['BW'].astype(float)   # Converts to float
new_df['BW'] = new_df['BW'].astype(int)     # Converts to integer

You can eventually apply your groupby and aggregation as you did !

That's probably due to the wrong data type. You can try this.

new_df = new_df.convert_dtypes()
new_df.groupby(['annlegg','Merd'],as_index=False).mean()

You can check dtype via:

new_df.dtype

You can try.agg() function to target specific columns.

new_df.groupby(['annlegg','Merd']).agg({'BW':'mean'})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM