[英]Imputate Nan for categorical data depending on its "Type" column
I have dataframe with 2 columns Name and Signal.我有 dataframe 有 2 列名称和信号。 I want to fill nan values in Signal column but it should be done according to its Name.我想在 Signal 列中填写 nan 值,但应该根据其名称来完成。 I want to imputate it with Most frequent value according to its Name.我想根据它的名称用最常见的值来估算它。 For example:例如:
Timestamp Name Signal
2021-01-01 A. On
2021-01-02. A nan
2021-01-03. A. On
2021-01-01. B. Off
2021-01-02. B. Off
2021-01-03. B. nan
For name A nan value of Signal column should be imputated with "On" since it is most frequent value but for Name B it should be filled with Off because it is the most frequent for B.对于名称 A 的 Signal 列的 nan 值应该用“On”进行估算,因为它是最常见的值,但对于名称 B,它应该填充为 Off,因为它是 B 最常见的值。
How can I achieve it?我怎样才能实现它?
df = df.groupby('Name').apply(lambda x: x.fillna(x['Signal'].value_counts().index[0]))
Output: Output:
>>> df
Timestamp Name Signal
0 2021-01-01 A On
1 2021-01-02 A On
2 2021-01-03 A On
3 2021-01-01 B Off
4 2021-01-02 B Off
5 2021-01-03 B Off
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.