[英]How can I remove nan columns if values are string/Integer dtypes at once?
I have data like:我有这样的数据:
In [1]: d = {'ID': [14, 14, 14, 14, 14, 14, 14, 15, 15],
'NAME': ['KWI', 'NED', 'RICK', 'NICH', 'DIONIC', 'RICHARD', 'ROCKY', 'CARLOS', 'SIDARTH'],
'ID_COUNTRY':[1, 2, 3,4,5,6,7,8,9],
'COUNTRY':['MEXICO', 'ITALY', 'CANADA', 'ENGLAND', 'GERMANY', 'UNITED STATES', 'JAPAN', 'SPAIN', 'BRAZIL'],
'ID_CITY':[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
'CITY':[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
'STATUS': ['OK', 'OK', 'OK', 'OK', 'OK', 'NOT', 'OK', 'NOT', 'OK']}
df = pd.DataFrame(data=d)
Out[2]:
ID NAME ID_COUNTRY COUNTRY ID_CITY CITY STATUS
0 14 KWI 1 MEXICO NaN NaN OK
1 14 NED 2 ITALY NaN NaN OK
2 14 RICK 3 CANADA NaN NaN OK
3 14 NICH 4 ENGLAND NaN NaN OK
4 14 DIONIC 5 GERMANY NaN NaN OK
5 14 RICHARD 6 UNITED STATES NaN NaN NOT
6 14 ROCKY 7 JAPAN NaN NaN OK
7 15 CARLOS 8 SPAIN NaN NaN NOT
8 15 SIDHART 9 BRAZIL NaN NaN OK
Then I need to set the dtypes of each column for future uses using:然后我需要使用以下方法设置每列的 dtypes 以供将来使用:
df.iloc[:, [0, 2, 4]] = df.iloc[:, [0, 2, 4]].astype("Int64")
df.iloc[:, [1, 3, 5, 6]] = df.iloc[:, [1, 3, 5, 6]].astype("string")
After doing this I want to drop the columns that have completely nan
values and get the names of the columns dropped to be remmoved in another dataframe with the same column names like this:执行此操作后,我想删除具有完全nan
值的列,并将删除的列的名称删除到另一个具有相同列名的数据框中,如下所示:
In [3]: d1 = {'ID': [14, 14, 14],
'NAME': ['KWI', 'NED', 'RICK'],
'ID_COUNTRY':[1, 2, 3],
'COUNTRY':['MEXICO', 'ITALY', 'CANADA'],
'ID_CITY':[20, 22, 24],
'CITY':['MX', 'AT', 'CA'],
'STATUS': ['OK', 'OK', 'OK']}
df1 = pd.DataFrame(data=d1)
Out [4]:
ID NAME ID_COUNTRY COUNTRY ID_CITY CITY STATUS
0 14 KWI 1 MEXICO 20 MX OK
1 14 NED 2 ITALY 22 AT OK
2 14 RICK 3 CANADA 24 CA OK
The issue here is when I try df['CITY'].isna()
because is giving me False
for all the values in the column.这里的问题是当我尝试df['CITY'].isna()
因为列中的所有值都给了我False
。 I do not why is giving me that and when I try with df['ID_CITY'].isna()
is giving me True
.我不知道为什么给我那个,当我尝试使用df['ID_CITY'].isna()
给我True
。 I guess is because one is Int64
and the other object
.我猜是因为一个是Int64
而另一个object
。 Examples:例子:
In [5]: df4['ID_CITY'].isna()
Out[6]:
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
Name: ID_CITY, dtype: bool
In [7]: df4['CITY'].isna()
Out[8]:
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
Name: CITY, dtype: bool
After correcting what I mention before the desired output for df
and df1
will be:在更正我在df
和df1
所需输出之前提到的内容之后:
Out[9]:
ID NAME ID_COUNTRY COUNTRY STATUS
0 14 KWI 1 MEXICO OK
1 14 NED 2 ITALY OK
2 14 RICK 3 CANADA OK
3 14 NICH 4 ENGLAND OK
4 14 DIONIC 5 GERMANY OK
5 14 RICHARD 6 UNITED STATES NOT
6 14 ROCKY 7 JAPAN OK
7 15 CARLOS 8 SPAIN NOT
8 15 SIDHART 9 BRAZIL OK
Out [10]:
ID NAME ID_COUNTRY COUNTRY STATUS
0 14 KWI 1 MEXICO OK
1 14 NED 2 ITALY OK
2 14 RICK 3 CANADA OK
Thaks for reading me.感谢阅读我。
Assuming that your input is (Instead of using column index, I have just used column names for clarifications):假设您的输入是(而不是使用列索引,我只是使用列名进行说明):
d = {'ID': [14, 14, 14, 14, 14, 14, 14, 15, 15],
'NAME': ['KWI', 'NED', 'RICK', 'NICH', 'DIONIC', 'RICHARD', 'ROCKY', 'CARLOS', 'SIDARTH'],
'ID_COUNTRY':[1, 2, 3,4,5,6,7,8,9],
'COUNTRY':['MEXICO', 'ITALY', 'CANADA', 'ENGLAND', 'GERMANY', 'UNITED STATES', 'JAPAN', 'SPAIN', 'BRAZIL'],
'ID_CITY':[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
'CITY':[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
'STATUS': ['OK', 'OK', 'OK', 'OK', 'OK', 'NOT', 'OK', 'NOT', 'OK']}
df = pd.DataFrame(data=d)
You can cast a pd object to a specified dtype
.您可以将 pd 对象转换为指定的dtype
。 For that, you can use Int64
and str
(instead of string in your code) [see the link] .为此,您可以使用Int64
和str
(而不是代码中的字符串) [见链接] 。
df[['ID', 'ID_COUNTRY', 'ID_CITY']] = df[['ID', 'ID_COUNTRY', 'ID_CITY']].astype("Int64")
df[['NAME', 'COUNTRY', 'CITY', 'STATUS']] = df[['NAME', 'COUNTRY', 'CITY', 'STATUS']].astype("str")
With a temporary typecasting, you can determine NaN values.通过临时类型转换,您可以确定 NaN 值。 For this, take into account that float accepts the strings nan
with an optional prefix +
or -
for Not a Number (NaN).为此,请考虑到 float 接受带有可选前缀+
或-
的字符串nan
表示非数字 (NaN)。
df['CITY'].astype("float").isna()
The output:输出:
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
Name: CITY, dtype: bool
Either任何一个
df['ID_CITY'].isna()
or或者
df['ID_CITY'].astype("float").isna()
will result:将导致:
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 True
8 True
Name: ID_CITY, dtype: bool
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.