简体   繁体   English

pandas dataframe 组有条件

[英]pandas dataframe group with condition

I have a 3D dataframe with x and y and time as 3rd dimension.我有一个 3D dataframe,x 和 y 以及时间作为第三维。 The data are 5 indizes of satellite images that were taken at different times.这些数据是在不同时间拍摄的 5 幅卫星图像。 The x and y describes every pixel. x 和 y 描述了每个像素。

 x        y              time       SIPI       classif
7.620001 -77.849990     2018-04-07  1.011107    2.0
                        2018-10-14  1.023407    2.0
                        2018-12-28  0.045107    3.0
                        2020-01-10  0.351107    2.0
                        2018-06-29  0.351107    2.0
         -77.849899     2018-04-07  1.010777    8.0
                        2018-10-14  0.510562    2.0
                        2018-12-28  1.410766    4.0
                        2020-01-10  1.010666    8.0
                        2018-06-29  2.057068    8.0
         -77.849809     2018-04-07  0.986991    1.0
                        2018-10-14  0.986991    8.0
                        2018-12-28  0.986991    5.0
                        2020-01-10  0.984791    5.0
                        2018-06-29  0.986991    3.0
         -77.849718     2018-04-07  0.975965    10.0
                        2018-10-14  0.964765    7.0
                        2018-12-28  0.975965    10.0
                        2020-01-10  0.975965    10.0
                        2018-06-29  0.975965    3.0
         -77.849627     2018-04-07  1.957747    2.0
                        2018-10-14  0.132445    6.0
                        2018-12-28  0.589677    2.0
                        2020-01-10  1.982445    2.0
                        2018-06-29  3.334456    7.0

I need to group the data and as new column I need the value from column 'classif_rf', which is most frequent in 5 datasets.我需要对数据进行分组,作为新列,我需要列“classif_rf”中的值,这在 5 个数据集中最常见。 The values are integers between 1 and 10. I want to add an condition which add only frequency higher than 3.这些值是 1 到 10 之间的整数。我想添加一个仅添加高于 3 的频率的条件。

 x          y           classif
7.620001 -77.849990     2.0
         -77.849899     8.0
         -77.849809     Na
         -77.849718     10.0
         -77.849627     2.0

So as a result I need dataframe where each pixel has a value with highest frequency and when the frequency is lower than 3 there should be a NA value.因此,我需要 dataframe ,其中每个像素都有一个频率最高的值,当频率低于 3 时,应该有一个 NA 值。

Can the pandas.groupby function do that? pandas.groupby function 能做到吗? I thought about value_counts(), but I'm not sure how to implement that to my dataset.我考虑过 value_counts(),但我不确定如何在我的数据集上实现它。

Thank you in advance!先感谢您!

Here is a clunky way to do it:这是一种笨拙的方法:

# Get the modes per group and count how often they occur
df_modes = df.groupby(["x", "y"]).agg(
    {
        'classif': [lambda x: pd.Series.mode(x)[0], 
                    lambda x: sum(x == pd.Series.mode(x)[0])]
    }
).reset_index()
# Rename the columns to something a bit more readable
df_modes.columns = ["x", "y", "classif_mode", "classif_mode_freq"]
# Discard modes whose frequency was less than 3
df_modes.loc[df_modes["classif_mode_freq"] < 3, "classif_mode"] = np.nan

Now df_modes.drop("classif_mode_freq", axis=1) will return现在df_modes.drop("classif_mode_freq", axis=1)将返回

          x          y  classif_mode
0  7.620001 -77.849990           2.0
1  7.620001 -77.849899           8.0
2  7.620001 -77.849809           NaN
3  7.620001 -77.849718          10.0
4  7.620001 -77.849627           2.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM