Groupby和变换Pandas

Question

Sample DF:样本 DF：

 sample_df = pd.DataFrame(np.random.randint(1,20,size=(10, 2)), columns=list('BC'))
sample_df["date"]= ["2020-02-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01",
                    "2020-02-02","2020-02-02","2020-02-02","2020-02-02","2020-02-02"]
sample_df["date"] = pd.to_datetime(sample_df["date"])
sample_df.set_index(sample_df["date"],inplace=True)
sample_df["A"]=[10,10,10,10,10,12,1,3,4,2]
del sample_df["date"]
sample_df

Sample DF:样本 DF：

            B   C   A
date                  
2020-02-01  19  12  10
2020-02-01  11   1  10
2020-02-01  10   1  10
2020-02-01  13   4  10
2020-02-01   5  15  10
2020-02-02  10   3  12
2020-02-02   3   7   1
2020-02-02   6  13   3
2020-02-02  17  10   4
2020-02-02  15   1   2

Condition:健康）状况：

Group by index and then apply pandas Quantile cut on column A if there is an error in that, then apply the quantile cut on mean(col A and col C) Group by index ，然后在A列上应用pandas分位数切割，如果其中有错误，然后在mean(col A and col C)

try:
    Quantile cut column A
except:
    quantile cut mean(col A and col C)

Code:代码：

def func(df,n_bins):
    try:
        proc_col = pd.qcut(df["A"].values, n_bins, labels=range(0,n_bins))
        return proc_col
    except:
        proc_col = pd.qcut(df.mean(axis =1).values, n_bins, labels=range(0,n_bins))
        return proc_col

sample_df["A"]=sample_df.groupby([sample_df.index.get_level_values(0)])[["C","A"]].apply(lambda df: func(df,3))
sample_df

OP:操作：

            B   C     A
date            
2020-02-01  1   16  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  5   19  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  2   16  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  12  11  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  15  10  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  19  17  [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  17  7   [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  14  1   [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  19  13  [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  15  13  [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...

Expected OP:预期的操作：

            B   C    A
date            
2020-02-01  1   16   1
2020-02-01  5   19   2
2020-02-01  2   16   1
2020-02-01  12  11   0
2020-02-01  15  10   0
2020-02-02  19  17   2
2020-02-02  17  7    0
2020-02-02  14  1    1
2020-02-02  19  13   2
2020-02-02  15  13   0

Any suggestions on the mistake will be great.任何关于错误的建议都会很棒。 I tried transform in place of apply but that gives me a error.我尝试使用transform代替apply ，但这给了我一个错误。

Answer 1

Use transform to series, stack so both series are appended into a long serie with the respective indexes and droplevel to fix the two-level index.使用转换到系列，堆栈，以便将两个系列附加到具有相应索引和 droplevel 的长系列中以修复两级索引。

sample_df["A"]=sample_df.groupby([sample_df.index.get_level_values(0)])[["C","A"]].apply(lambda df: func(df,3)).transform(pd.Series).stack().droplevel(1)

Groupby和变换Pandas

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-04-17 04:58:32

Groupby和变换Pandas

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-04-17 04:58:32

解决方案1
1 已采纳 2020-04-17 04:58:32