[英]Groupby and transform Pandas
Sample DF:样本 DF:
sample_df = pd.DataFrame(np.random.randint(1,20,size=(10, 2)), columns=list('BC'))
sample_df["date"]= ["2020-02-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01",
"2020-02-02","2020-02-02","2020-02-02","2020-02-02","2020-02-02"]
sample_df["date"] = pd.to_datetime(sample_df["date"])
sample_df.set_index(sample_df["date"],inplace=True)
sample_df["A"]=[10,10,10,10,10,12,1,3,4,2]
del sample_df["date"]
sample_df
Sample DF:样本 DF:
B C A
date
2020-02-01 19 12 10
2020-02-01 11 1 10
2020-02-01 10 1 10
2020-02-01 13 4 10
2020-02-01 5 15 10
2020-02-02 10 3 12
2020-02-02 3 7 1
2020-02-02 6 13 3
2020-02-02 17 10 4
2020-02-02 15 1 2
Condition:健康)状况:
Group by index
and then apply pandas
Quantile cut on column A
if there is an error in that, then apply the quantile cut on mean(col A and col C)
Group by index
,然后在A
列上应用pandas
分位数切割,如果其中有错误,然后在mean(col A and col C)
try:
Quantile cut column A
except:
quantile cut mean(col A and col C)
Code:代码:
def func(df,n_bins):
try:
proc_col = pd.qcut(df["A"].values, n_bins, labels=range(0,n_bins))
return proc_col
except:
proc_col = pd.qcut(df.mean(axis =1).values, n_bins, labels=range(0,n_bins))
return proc_col
sample_df["A"]=sample_df.groupby([sample_df.index.get_level_values(0)])[["C","A"]].apply(lambda df: func(df,3))
sample_df
OP:操作:
B C A
date
2020-02-01 1 16 [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01 5 19 [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01 2 16 [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01 12 11 [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01 15 10 [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-02 19 17 [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02 17 7 [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02 14 1 [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02 19 13 [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02 15 13 [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
Expected OP:预期的操作:
B C A
date
2020-02-01 1 16 1
2020-02-01 5 19 2
2020-02-01 2 16 1
2020-02-01 12 11 0
2020-02-01 15 10 0
2020-02-02 19 17 2
2020-02-02 17 7 0
2020-02-02 14 1 1
2020-02-02 19 13 2
2020-02-02 15 13 0
Any suggestions on the mistake will be great.任何关于错误的建议都会很棒。 I tried transform
in place of apply
but that gives me a error.我尝试使用transform
代替apply
,但这给了我一个错误。
Use transform to series, stack so both series are appended into a long serie with the respective indexes and droplevel to fix the two-level index.使用转换到系列,堆栈,以便将两个系列附加到具有相应索引和 droplevel 的长系列中以修复两级索引。
sample_df["A"]=sample_df.groupby([sample_df.index.get_level_values(0)])[["C","A"]].apply(lambda df: func(df,3)).transform(pd.Series).stack().droplevel(1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.