简体   繁体   中英

Groupby and transform Pandas

Sample DF:

 sample_df = pd.DataFrame(np.random.randint(1,20,size=(10, 2)), columns=list('BC'))
sample_df["date"]= ["2020-02-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01",
                    "2020-02-02","2020-02-02","2020-02-02","2020-02-02","2020-02-02"]
sample_df["date"] = pd.to_datetime(sample_df["date"])
sample_df.set_index(sample_df["date"],inplace=True)
sample_df["A"]=[10,10,10,10,10,12,1,3,4,2]
del sample_df["date"]
sample_df

Sample DF:

            B   C   A
date                  
2020-02-01  19  12  10
2020-02-01  11   1  10
2020-02-01  10   1  10
2020-02-01  13   4  10
2020-02-01   5  15  10
2020-02-02  10   3  12
2020-02-02   3   7   1
2020-02-02   6  13   3
2020-02-02  17  10   4
2020-02-02  15   1   2

Condition:

Group by index and then apply pandas Quantile cut on column A if there is an error in that, then apply the quantile cut on mean(col A and col C)

try:
    Quantile cut column A
except:
    quantile cut mean(col A and col C)

Code:

def func(df,n_bins):
    try:
        proc_col = pd.qcut(df["A"].values, n_bins, labels=range(0,n_bins))
        return proc_col
    except:
        proc_col = pd.qcut(df.mean(axis =1).values, n_bins, labels=range(0,n_bins))
        return proc_col

sample_df["A"]=sample_df.groupby([sample_df.index.get_level_values(0)])[["C","A"]].apply(lambda df: func(df,3))
sample_df

OP:

            B   C     A
date            
2020-02-01  1   16  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  5   19  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  2   16  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  12  11  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01  15  10  [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  19  17  [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  17  7   [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  14  1   [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  19  13  [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02  15  13  [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...

Expected OP:

            B   C    A
date            
2020-02-01  1   16   1
2020-02-01  5   19   2
2020-02-01  2   16   1
2020-02-01  12  11   0
2020-02-01  15  10   0
2020-02-02  19  17   2
2020-02-02  17  7    0
2020-02-02  14  1    1
2020-02-02  19  13   2
2020-02-02  15  13   0

Any suggestions on the mistake will be great. I tried transform in place of apply but that gives me a error.

Use transform to series, stack so both series are appended into a long serie with the respective indexes and droplevel to fix the two-level index.

sample_df["A"]=sample_df.groupby([sample_df.index.get_level_values(0)])[["C","A"]].apply(lambda df: func(df,3)).transform(pd.Series).stack().droplevel(1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM