Pandas groupby - dataframe's column disappearing

Question

I have the following data frame called "new_df":

dato    uttak   annlegg Merd    ID  Leng    BW  CF  F   B   H   K
0   2020-12-15  12_20   LL  3   1   48.0    1200    1.085069    0.0 2.0 0.0 NaN
1   2020-12-15  12_20   LL  3   2   43.0    830 1.043933    0.0 1.0 0.0 NaN

columns are:

'dato', 'uttak', 'annlegg', 'Merd', 'ID', 'Leng', 'BW', 'CF', 'F', 'B', 'H', 'K'

when I do:

new_df.groupby(['annlegg','Merd'],as_index=False).mean()

I got all means except the column "BW" like this:

annlegg Merd   ID        Leng         CF           F       B               H        K
0   KH  1   42.557143   56.398649   1.265812    0.071770    1.010638    0.600000    0.127907
1   KH  2   42.683794   56.492228   1.270522    0.021978    0.739130    0.230769    0.075862
2   KH  3   42.177866   35.490119   1.125416    0.000000    0.384146    0.333333    0.034483

Column "BW" just disappeared when I groupby, no matter "as_index" True or False, why is that?

Answer 1

It appears the content as the BW column does not have a numerical type but an object type instead, which is used for storing strings for instance. Thus when applying groupby and mean aggregation function, tour column disappears has computing the mean value of an object (think of a string does not make sense in general ).

You should start by converting your BW column:

First method: pd.to_numeric

This first method will safely convert all your column to float objects.

new_df['BW'] = pd.to_numeric(new_df['BW'])

Second method: df.astype

If you do not want to convert your data to float (for instance, you know that this column only contains int, or if floating point precision does not interest you), you can use the astype method which allows you to convert to almost any type you want:

new_df['BW'] = new_df['BW'].astype(float)   # Converts to float
new_df['BW'] = new_df['BW'].astype(int)     # Converts to integer

You can eventually apply your groupby and aggregation as you did !

Answer 2

That's probably due to the wrong data type. You can try this.

new_df = new_df.convert_dtypes()
new_df.groupby(['annlegg','Merd'],as_index=False).mean()

You can check dtype via:

new_df.dtype

Answer 3

You can try.agg() function to target specific columns.

new_df.groupby(['annlegg','Merd']).agg({'BW':'mean'})

Pandas groupby - dataframe's column disappearing

Question

3 answers

solution1
1 ACCPTED 2021-05-19 10:16:00

First method: pd.to_numeric

Second method: df.astype

solution2
0 2021-05-18 21:08:55

solution3
0 2021-05-18 21:21:00

Pandas groupby - dataframe's column disappearing

Question

3 answers

solution1 1 ACCPTED 2021-05-19 10:16:00

First method: pd.to_numeric

Second method: df.astype

solution2 0 2021-05-18 21:08:55

solution3 0 2021-05-18 21:21:00

solution1
1 ACCPTED 2021-05-19 10:16:00

solution2
0 2021-05-18 21:08:55

solution3
0 2021-05-18 21:21:00