简体   繁体   English

Pandas groupby - 数据框的列消失

[英]Pandas groupby - dataframe's column disappearing

I have the following data frame called "new_df":我有以下名为“new_df”的数据框:

dato    uttak   annlegg Merd    ID  Leng    BW  CF  F   B   H   K
0   2020-12-15  12_20   LL  3   1   48.0    1200    1.085069    0.0 2.0 0.0 NaN
1   2020-12-15  12_20   LL  3   2   43.0    830 1.043933    0.0 1.0 0.0 NaN

columns are:列是:

'dato', 'uttak', 'annlegg', 'Merd', 'ID', 'Leng', 'BW', 'CF', 'F', 'B', 'H', 'K'

when I do:当我做:

new_df.groupby(['annlegg','Merd'],as_index=False).mean()

I got all means except the column "BW" like this:除了像这样的“BW”列之外,我得到了所有的手段:

annlegg Merd   ID        Leng         CF           F       B               H        K
0   KH  1   42.557143   56.398649   1.265812    0.071770    1.010638    0.600000    0.127907
1   KH  2   42.683794   56.492228   1.270522    0.021978    0.739130    0.230769    0.075862
2   KH  3   42.177866   35.490119   1.125416    0.000000    0.384146    0.333333    0.034483

Column "BW" just disappeared when I groupby, no matter "as_index" True or False, why is that?当我分组时,列“BW”刚刚消失,无论“as_index”是真还是假,这是为什么呢?

It appears the content as the BW column does not have a numerical type but an object type instead, which is used for storing strings for instance.看起来内容为BW列没有数字类型,而是object类型,例如用于存储字符串。 Thus when applying groupby and mean aggregation function, tour column disappears has computing the mean value of an object (think of a string does not make sense in general ).因此,当应用 groupby 和mean聚合 function 时,游览列消失了计算 object 的平均值(认为字符串通常没有意义)。

You should start by converting your BW column:您应该首先转换您的BW列:

First method: pd.to_numeric第一种方法: pd.to_numeric


This first method will safely convert all your column to float objects.第一种方法将安全地将所有列转换为float对象。

new_df['BW'] = pd.to_numeric(new_df['BW'])

Second method: df.astype第二种方法: df.astype


If you do not want to convert your data to float (for instance, you know that this column only contains int, or if floating point precision does not interest you), you can use the astype method which allows you to convert to almost any type you want:如果您不想将数据转换为float (例如,您知道该列仅包含 int,或者您对浮点精度不感兴趣),您可以使用astype方法,该方法允许您转换为几乎任何类型你要:

new_df['BW'] = new_df['BW'].astype(float)   # Converts to float
new_df['BW'] = new_df['BW'].astype(int)     # Converts to integer

You can eventually apply your groupby and aggregation as you did !您最终可以像以前一样应用您的 groupby 和聚合!

That's probably due to the wrong data type.这可能是由于错误的数据类型。 You can try this.你可以试试这个。

new_df = new_df.convert_dtypes()
new_df.groupby(['annlegg','Merd'],as_index=False).mean()

You can check dtype via:您可以通过以下方式检查 dtype:

new_df.dtype

You can try.agg() function to target specific columns.您可以 try.agg() function 来定位特定列。

new_df.groupby(['annlegg','Merd']).agg({'BW':'mean'})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM