[英]selecting not None value from a dataframe column
I would like to use the fillna
function to fill None value of a column with its own first most frequent value that is not None or nan. 我想使用
fillna
函数来填充列的None值,其中第一个最常用的值不是None或nan。
Input DF: 输入DF:
Col_A
a
None
None
c
c
d
d
The output Dataframe could be either: 输出Dataframe可以是:
Col_A
a
c
c
c
c
d
d
Any suggestion would be very appreciated. 任何建议将非常感谢。 Many Thanks, Best Regards, Carlo
非常感谢,最诚挚的问候,卡罗
Prelude: If your None
is actually a string , you can simplify any headaches by getting rid of them first-up. 序言:如果您的
None
实际上是一个字符串 ,您可以通过首先摆脱它们来简化任何麻烦。 Use replace
: 使用
replace
:
df = df.replace('None', np.nan)
I believe you could use fillna
+ value_counts
: 我相信你可以使用
fillna
+ value_counts
:
df
Col_A
0 a
1 NaN
2 NaN
3 c
4 c
5 d
6 d
df.fillna(df.Col_A.value_counts(sort=False).index[0])
Col_A
0 a
1 c
2 c
3 c
4 c
5 d
6 d
Or, with Vaishali's suggestion, use idxmax
to pick c
: 或者,根据Vaishali的建议,使用
idxmax
来选择c
:
df.fillna(df.Col_A.value_counts(sort=False).idxmax())
Col_A
0 a
1 c
2 c
3 c
4 c
5 d
6 d
The fill-values could either be c
or d
, depending on whether you include sort=False
or not. 填充值可以是
c
或d
,具体取决于是否包含sort=False
。
Details 细节
df.Col_A.value_counts(sort=False)
c 2
a 1
d 2
Name: Col_A, dtype: int64
fillna
+ mode
fillna
+ mode
df.Col_A.fillna(df.Col_A.mode()[0])
Out[963]:
0 a
1 c
2 c
3 c
4 c
5 d
6 d
Name: Col_A, dtype: object
To address 'None', you need to use replace
then fillna
much like @COLDSPEED suggests: 要解决'无',你需要使用
replace
然后fillna
,就像@COLDSPEED建议:
dr = df.Col_A.replace('None',np.nan)
dr.fillna(dr.dropna().value_counts().index[0])
Output: 输出:
0 a
1 d
2 d
3 c
4 c
5 d
6 d
Name: Col_A, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.