简体   繁体   English

Groupby和变换Pandas

[英]Groupby and transform Pandas

Sample DF:样本 DF:

 sample_df = pd.DataFrame(np.random.randint(1,20,size=(10, 2)), columns=list('BC'))
sample_df["date"]= ["2020-02-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01",
                    "2020-02-02","2020-02-02","2020-02-02","2020-02-02","2020-02-02"]
sample_df["date"] = pd.to_datetime(sample_df["date"])
sample_df.set_index(sample_df["date"],inplace=True)
sample_df["A"]=[10,10,10,10,10,12,1,3,4,2]
del sample_df["date"]
sample_df

Sample DF:样本 DF:

            B   C   A
date                  
2020-02-01  19  12  10
2020-02-01  11   1  10
2020-02-01  10   1  10
2020-02-01  13   4  10
2020-02-01   5  15  10
2020-02-02  10   3  12
2020-02-02   3   7   1
2020-02-02   6  13   3
2020-02-02  17  10   4
2020-02-02  15   1   2

Condition:健康)状况:

Group by index and then apply pandas Quantile cut on column A if there is an error in that, then apply the quantile cut on mean(col A and col C) Group by index ,然后在A列上应用pandas分位数切割,如果其中有错误,然后在mean(col A and col C)

try:
    Quantile cut column A
except:
    quantile cut mean(col A and col C)

Code:代码:

def func(df,n_bins):
    try:
        proc_col = pd.qcut(df["A"].values, n_bins, labels=range(0,n_bins))
        return proc_col
    except:
        proc_col = pd.qcut(df.mean(axis =1).values, n_bins, labels=range(0,n_bins))
        return proc_col

sample_df["A"]=sample_df.groupby([sample_df.index.get_level_values(0)])[["C","A"]].apply(lambda df: func(df,3))
sample_df

OP:操作:

            B   C     A
date            
2020-02-01  1   16  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  5   19  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  2   16  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  12  11  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  15  10  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  19  17  [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  17  7   [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  14  1   [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  19  13  [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  15  13  [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...

Expected OP:预期的操作:

            B   C    A
date            
2020-02-01  1   16   1
2020-02-01  5   19   2
2020-02-01  2   16   1
2020-02-01  12  11   0
2020-02-01  15  10   0
2020-02-02  19  17   2
2020-02-02  17  7    0
2020-02-02  14  1    1
2020-02-02  19  13   2
2020-02-02  15  13   0

Any suggestions on the mistake will be great.任何关于错误的建议都会很棒。 I tried transform in place of apply but that gives me a error.我尝试使用transform代替apply ,但这给了我一个错误。

Use transform to series, stack so both series are appended into a long serie with the respective indexes and droplevel to fix the two-level index.使用转换到系列,堆栈,以便将两个系列附加到具有相应索引和 droplevel 的长系列中以修复两级索引。

sample_df["A"]=sample_df.groupby([sample_df.index.get_level_values(0)])[["C","A"]].apply(lambda df: func(df,3)).transform(pd.Series).stack().droplevel(1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM