[英]How do I remove NaN values from my category type columns? I used .dropna but it doesn't work
According to the pandas.DataFrame.dropna documentation, dropna method is used to completely drop the columns or rows.根据 pandas.DataFrame.dropna 文档,dropna 方法用于完全删除列或行。 Are you going to drop completely columns with existing Nan values?
您要完全删除具有现有 Nan 值的列吗? Or what do you mean by “I want to remove Nan values”?
或者您所说的“我想删除 Nan 值”是什么意思?
Try this to drop the Nan value columns试试这个来删除 Nan 值列
df.dropna(axis=1)
you can find more info on how to call dropna()
method at pandas.DataFrame.dropna and below listing example of usage with row if does have missing and columns drop if any contains NaN
so resulting is empty dataframe since all cols does have missing:您可以在pandas.DataFrame.dropna找到有关如何调用
dropna()
方法的更多信息,下面列出了使用行的示例,如果确实有缺失,列会丢弃,如果任何包含NaN
,因此结果为空 Z6A8064B5DF479455570055
In [4]: import pandas as pd
In [5]: import numpy as np
In [6]: df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f', 'h'],columns=['one', 'two', 'three'])
In [7]: df
Out[7]:
one two three
a -1.040103 1.964200 1.519638
c -0.796710 1.654887 -0.614065
e 1.899870 0.810478 1.294714
f -0.913869 1.052014 -0.114583
h 0.186190 -0.156173 -2.323759
In [8]: df2 = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
In [9]: df2
Out[9]:
one two three
a -1.040103 1.964200 1.519638
b NaN NaN NaN
c -0.796710 1.654887 -0.614065
d NaN NaN NaN
e 1.899870 0.810478 1.294714
f -0.913869 1.052014 -0.114583
g NaN NaN NaN
h 0.186190 -0.156173 -2.323759
In [10]: df2.dropna()
Out[10]:
one two three
a -1.040103 1.964200 1.519638
c -0.796710 1.654887 -0.614065
e 1.899870 0.810478 1.294714
f -0.913869 1.052014 -0.114583
h 0.186190 -0.156173 -2.323759
In [11]: df2.dropna(axis="columns")
Out[11]:
Empty DataFrame
Columns: []
Index: [a, b, c, d, e, f, g, h]
In [12]: df2
Out[12]:
one two three
a -1.040103 1.964200 1.519638
b NaN NaN NaN
c -0.796710 1.654887 -0.614065
d NaN NaN NaN
e 1.899870 0.810478 1.294714
f -0.913869 1.052014 -0.114583
g NaN NaN NaN
h 0.186190 -0.156173 -2.323759
In [13]: df2.dropna(inplace=True)
In [14]: df2
Out[14]:
one two three
a -1.040103 1.964200 1.519638
c -0.796710 1.654887 -0.614065
e 1.899870 0.810478 1.294714
f -0.913869 1.052014 -0.114583
h 0.186190 -0.156173 -2.323759
In [26]: df2[['one','three']].dropna(axis='columns')
Out[26]:
Empty DataFrame
Columns: []
Index: [a, b, c, d, e, f, g, h]
In [27]: df2[['one','three']].dropna(axis=0)
Out[27]:
one three
a -1.040103 1.519638
c -0.796710 -0.614065
e 1.899870 1.294714
f -0.913869 -0.114583
h 0.186190 -2.323759
You can use the replace( ) function您可以使用替换() function
df.replace('NaN', 'Word XY')
Table with NaN带有 NaN 的表
import pandas as pd
df = pd.DataFrame([
['1', 'Fares', 32, True],
['2', 'Elena', 23, 'NaN'],
['NaN', 'Steven', 40, True],
['4', 'Max', 24, 'NaN'],
['5', 'Mike', 20, False],
['NaN', 'John', 40, True]])
df.columns = ['id', 'name', 'age', 'decision']
df
Output: Output:
Now use the replace function( )现在使用替换函数()
df.replace('NaN', ' ')
Desired Output:所需的 Output:
So what this does is simply replace the string 'NaN' with an empty whitespace.因此,它所做的只是将字符串 'NaN' 替换为一个空白空格。 You can add whatever word you want for the replacement.
您可以添加任何您想要替换的单词。
df.replace('NaN', '######') would replace every NaN with number signs. df.replace('NaN', '######') 将用数字符号替换每个 NaN。
I hope I could help you a bit.我希望我能帮助你一点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.