简体   繁体   English

如果值一次是字符串/整数 dtypes,如何删除 nan 列?

[英]How can I remove nan columns if values are string/Integer dtypes at once?

I have data like:我有这样的数据:

In [1]: d = {'ID': [14, 14, 14, 14, 14, 14, 14, 15, 15], 
         'NAME': ['KWI', 'NED', 'RICK', 'NICH', 'DIONIC', 'RICHARD', 'ROCKY', 'CARLOS', 'SIDARTH'], 
         'ID_COUNTRY':[1, 2, 3,4,5,6,7,8,9], 
         'COUNTRY':['MEXICO', 'ITALY', 'CANADA', 'ENGLAND', 'GERMANY', 'UNITED STATES', 'JAPAN', 'SPAIN', 'BRAZIL'], 
         'ID_CITY':[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], 
         'CITY':[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], 
         'STATUS': ['OK', 'OK', 'OK', 'OK', 'OK', 'NOT', 'OK', 'NOT', 'OK']}
    df = pd.DataFrame(data=d)

Out[2]:
      ID       NAME      ID_COUNTRY     COUNTRY        ID_CITY     CITY     STATUS
0     14       KWI           1           MEXICO          NaN        NaN        OK
1     14       NED           2           ITALY           NaN        NaN        OK
2     14       RICK          3           CANADA          NaN        NaN        OK
3     14       NICH          4           ENGLAND         NaN        NaN       OK
4     14       DIONIC        5           GERMANY         NaN        NaN        OK 
5     14       RICHARD       6           UNITED STATES   NaN        NaN        NOT
6     14       ROCKY         7           JAPAN           NaN        NaN        OK
7     15       CARLOS        8           SPAIN           NaN        NaN        NOT
8     15       SIDHART       9           BRAZIL          NaN        NaN        OK

Then I need to set the dtypes of each column for future uses using:然后我需要使用以下方法设置每列的 dtypes 以供将来使用:

df.iloc[:, [0, 2, 4]] = df.iloc[:, [0, 2, 4]].astype("Int64")
df.iloc[:, [1, 3, 5, 6]] = df.iloc[:, [1, 3, 5, 6]].astype("string")

After doing this I want to drop the columns that have completely nan values and get the names of the columns dropped to be remmoved in another dataframe with the same column names like this:执行此操作后,我想删除具有完全nan值的列,并将删除的列的名称删除到另一个具有相同列名的数据框中,如下所示:

 In [3]: d1 = {'ID': [14, 14, 14], 
         'NAME': ['KWI', 'NED', 'RICK'], 
         'ID_COUNTRY':[1, 2, 3], 
         'COUNTRY':['MEXICO', 'ITALY', 'CANADA'], 
         'ID_CITY':[20, 22, 24], 
         'CITY':['MX', 'AT', 'CA'], 
         'STATUS': ['OK', 'OK', 'OK']}
    df1 = pd.DataFrame(data=d1)
 Out [4]: 
      ID       NAME      ID_COUNTRY     COUNTRY        ID_CITY     CITY     STATUS
0     14       KWI           1           MEXICO          20        MX        OK
1     14       NED           2           ITALY           22        AT        OK
2     14       RICK          3           CANADA          24        CA        OK

The issue here is when I try df['CITY'].isna() because is giving me False for all the values in the column.这里的问题是当我尝试df['CITY'].isna()因为列中的所有值都给了我False I do not why is giving me that and when I try with df['ID_CITY'].isna() is giving me True .我不知道为什么给我那个,当我尝试使用df['ID_CITY'].isna()给我True I guess is because one is Int64 and the other object .我猜是因为一个是Int64而另一个object Examples:例子:

In [5]: df4['ID_CITY'].isna()                       
Out[6]:                         
0    True                   
1    True
2    True                          
3    True
4    True
5    True
6    True
7    True
8    True
Name: ID_CITY, dtype: bool

In [7]: df4['CITY'].isna()
Out[8]:
0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8    False
Name: CITY, dtype: bool

After correcting what I mention before the desired output for df and df1 will be:在更正我在dfdf1所需输出之前提到的内容之后:

Out[9]:
      ID       NAME      ID_COUNTRY     COUNTRY          STATUS
0     14       KWI           1           MEXICO            OK
1     14       NED           2           ITALY             OK
2     14       RICK          3           CANADA            OK
3     14       NICH          4           ENGLAND           OK
4     14       DIONIC        5           GERMANY           OK 
5     14       RICHARD       6           UNITED STATES     NOT
6     14       ROCKY         7           JAPAN             OK
7     15       CARLOS        8           SPAIN             NOT
8     15       SIDHART       9           BRAZIL            OK

 Out [10]: 
      ID       NAME      ID_COUNTRY     COUNTRY     STATUS
0     14       KWI           1           MEXICO       OK
1     14       NED           2           ITALY        OK
2     14       RICK          3           CANADA       OK

Thaks for reading me.感谢阅读我。

Assuming that your input is (Instead of using column index, I have just used column names for clarifications):假设您的输入是(而不是使用列索引,我只是使用列名进行说明):

d = {'ID': [14, 14, 14, 14, 14, 14, 14, 15, 15], 
         'NAME': ['KWI', 'NED', 'RICK', 'NICH', 'DIONIC', 'RICHARD', 'ROCKY', 'CARLOS', 'SIDARTH'], 
         'ID_COUNTRY':[1, 2, 3,4,5,6,7,8,9], 
         'COUNTRY':['MEXICO', 'ITALY', 'CANADA', 'ENGLAND', 'GERMANY', 'UNITED STATES', 'JAPAN', 'SPAIN', 'BRAZIL'], 
         'ID_CITY':[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], 
         'CITY':[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], 
         'STATUS': ['OK', 'OK', 'OK', 'OK', 'OK', 'NOT', 'OK', 'NOT', 'OK']}
df = pd.DataFrame(data=d)

You can cast a pd object to a specified dtype .您可以将 pd 对象转换为指定的dtype For that, you can use Int64 and str (instead of string in your code) [see the link] .为此,您可以使用Int64str (而不是代码中的字符串) [见链接]

df[['ID', 'ID_COUNTRY', 'ID_CITY']] = df[['ID', 'ID_COUNTRY', 'ID_CITY']].astype("Int64")
df[['NAME', 'COUNTRY', 'CITY', 'STATUS']] = df[['NAME', 'COUNTRY', 'CITY', 'STATUS']].astype("str")

With a temporary typecasting, you can determine NaN values.通过临时类型转换,您可以确定 NaN 值。 For this, take into account that float accepts the strings nan with an optional prefix + or - for Not a Number (NaN).为此,请考虑到 float 接受带有可选前缀+-的字符串nan表示非数字 (NaN)。

df['CITY'].astype("float").isna()

The output:输出:

0    True
1    True
2    True
3    True
4    True
5    True
6    True
7    True
8    True
Name: CITY, dtype: bool

Either任何一个

df['ID_CITY'].isna()

or或者

df['ID_CITY'].astype("float").isna()

will result:将导致:

0    True
1    True
2    True
3    True
4    True
5    True
6    True
7    True
8    True
Name: ID_CITY, dtype: bool

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM