[英]replace pandas column values with counts
Pandas GroupBy 並用標准化計數替換值
樣本 DF:
df = pd.DataFrame(np.random.randint(0,20,size=(10,3)),columns=["c1","c2","c3"])
df["r1"]=["Apple","Mango","Apple","Mango","Mango","Mango","Apple","Mango","Apple","Apple"]
df["r2"]=["Orange","lemon","lemon","Orange","lemon","Orange","lemon","lemon","Orange","lemon"]
df["date"] = ["2002-01-01","2002-01-01","2002-01-01","2002-01-01","2002-01-01",
"2002-01-01","2002-02-01","2002-02-01","2002-02-01","2002-02-01"]
df["date"] = pd.to_datetime(df["date"])
df
東風:
c1 c2 c3 r1 r2 date
0 10 2 0 Apple Orange 2002-01-01
1 10 10 13 Mango lemon 2002-01-01
2 0 12 0 Apple lemon 2002-01-01
3 1 13 8 Mango Orange 2002-01-01
4 6 5 9 Mango lemon 2002-01-01
5 3 18 13 Mango Orange 2002-01-01
6 2 6 7 Apple lemon 2002-02-01
7 0 4 7 Mango lemon 2002-02-01
8 1 10 19 Apple Orange 2002-02-01
9 11 18 2 Apple lemon 2002-02-01
我正在嘗試按date
列分組並用標准化計數替換選定的列。
例如:
在組2002-01-01
中,列r1
值Apple
將被0.3
替換,因為在該組中有6
條記錄和2
條記錄有Apple
,所以2/6
和Mango
將被4/6
替換,即0.6
Pandas 解決方案:
df.groupby("date")[["r1","r2"]].apply(lambda x: x.map(x.value_counts()))
錯誤:
AttributeError: 'DataFrame' object has no attribute 'map'
是否有 pandas 方法來代替迭代iterrows
解決方案。
我們可以做value_counts
+ normalize
df['New']=df.groupby(['date']).r1.value_counts(normalize=True).reindex(pd.MultiIndex.from_frame(df[['date','r1']])).values
df
c1 c2 c3 r1 r2 date New
0 1 8 2 Apple Orange 2002-01-01 0.333333
1 8 1 7 Mango lemon 2002-01-01 0.666667
2 0 14 8 Apple lemon 2002-01-01 0.333333
3 11 13 10 Mango Orange 2002-01-01 0.666667
4 15 4 15 Mango lemon 2002-01-01 0.666667
5 13 7 7 Mango Orange 2002-01-01 0.666667
6 7 0 14 Apple lemon 2002-02-01 0.750000
7 13 5 11 Mango lemon 2002-02-01 0.250000
8 19 17 11 Apple Orange 2002-02-01 0.750000
9 8 1 9 Apple lemon 2002-02-01 0.750000
您可以使用transform
方法獲取每個組的大小,並將此值分配給原始 dataframe 的每一行。
In [11]: df.groupby(['date', 'r1'])['c1'].transform(len)/df.groupby(['date'])['c1'].transform(len)
Out[11]:
0 0.333333
1 0.666667
2 0.333333
3 0.666667
4 0.666667
5 0.666667
6 0.750000
7 0.250000
8 0.750000
9 0.750000
Name: c1, dtype: float64
如果您需要獲得四舍五入的值,只需使用round
方法。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.