簡體   English   中英

用計數替換 pandas 列值

[英]replace pandas column values with counts

Pandas GroupBy 並用標准化計數替換值

樣本 DF:

df = pd.DataFrame(np.random.randint(0,20,size=(10,3)),columns=["c1","c2","c3"])
df["r1"]=["Apple","Mango","Apple","Mango","Mango","Mango","Apple","Mango","Apple","Apple"]
df["r2"]=["Orange","lemon","lemon","Orange","lemon","Orange","lemon","lemon","Orange","lemon"]
df["date"] = ["2002-01-01","2002-01-01","2002-01-01","2002-01-01","2002-01-01",
              "2002-01-01","2002-02-01","2002-02-01","2002-02-01","2002-02-01"]
df["date"] = pd.to_datetime(df["date"])
df

東風:

    c1      c2      c3     r1        r2       date
0   10       2       0     Apple    Orange  2002-01-01
1   10      10      13     Mango    lemon   2002-01-01
2   0       12       0     Apple    lemon   2002-01-01
3   1       13       8     Mango    Orange  2002-01-01
4   6        5       9     Mango    lemon   2002-01-01
5   3       18      13     Mango    Orange  2002-01-01
6   2        6       7     Apple    lemon   2002-02-01
7   0        4       7     Mango    lemon   2002-02-01
8   1       10      19     Apple    Orange  2002-02-01
9   11      18       2     Apple    lemon   2002-02-01

我正在嘗試按date列分組並用標准化計數替換選定的列。

例如:

在組2002-01-01中,列r1Apple將被0.3替換,因為在該組中有6條記錄和2條記錄有Apple ,所以2/6Mango將被4/6替換,即0.6

Pandas 解決方案:

df.groupby("date")[["r1","r2"]].apply(lambda x: x.map(x.value_counts()))

錯誤:

AttributeError: 'DataFrame' object has no attribute 'map'

是否有 pandas 方法來代替迭代iterrows解決方案。

我們可以做value_counts + normalize

df['New']=df.groupby(['date']).r1.value_counts(normalize=True).reindex(pd.MultiIndex.from_frame(df[['date','r1']])).values
df
   c1  c2  c3     r1      r2       date       New
0   1   8   2  Apple  Orange 2002-01-01  0.333333
1   8   1   7  Mango   lemon 2002-01-01  0.666667
2   0  14   8  Apple   lemon 2002-01-01  0.333333
3  11  13  10  Mango  Orange 2002-01-01  0.666667
4  15   4  15  Mango   lemon 2002-01-01  0.666667
5  13   7   7  Mango  Orange 2002-01-01  0.666667
6   7   0  14  Apple   lemon 2002-02-01  0.750000
7  13   5  11  Mango   lemon 2002-02-01  0.250000
8  19  17  11  Apple  Orange 2002-02-01  0.750000
9   8   1   9  Apple   lemon 2002-02-01  0.750000

您可以使用transform方法獲取每個組的大小,並將此值分配給原始 dataframe 的每一行。

In [11]: df.groupby(['date', 'r1'])['c1'].transform(len)/df.groupby(['date'])['c1'].transform(len)                                                    
Out[11]: 
0    0.333333
1    0.666667
2    0.333333
3    0.666667
4    0.666667
5    0.666667
6    0.750000
7    0.250000
8    0.750000
9    0.750000
Name: c1, dtype: float64

如果您需要獲得四舍五入的值,只需使用round方法。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM