简体   繁体   English

如何从数据框中删除具有空值和分类变量的行?

[英]How can I drop the rows with null values and categorical variables from the dataframe?

I am trying to drop the rows with null values and categorical variables from the dataframe that I imported from Excel.我试图从我从 Excel 导入的数据框中删除具有空值和分类变量的行。 I've tried many other functions and many different ways to do so as well but I am not able to drop them, at least not all.我已经尝试了许多其他功能和许多不同的方法来这样做,但我无法删除它们,至少不是全部。

There are around 185000 rows with 6 columns.大约有 185000 行和 6 列。 What I was trying to do is using for loop to go through the entire rows and drop the rows if there is a null value or categorical variable on the column "Order ID".我试图做的是使用 for 循环遍历整行并在“订单 ID”列上有空值或分类变量时删除行。

This is one of the codes I've tried:这是我尝试过的代码之一:

f = 0

value = merged_file.at[f, 'Order ID']
for value in merged_file:
    if value is None:
        merged_file.drop(merged_file.index[f])
        merged_file.reset_index(inplace=True, drop=True)
        f+=1
        continue
    elif value == 'Order ID':
        merged_file.drop(merged_file.index[f])
        merged_file.reset_index(drop=True, inplace=True)
        f+=1
        continue
    elif f==186845:
        break
    else:
        f+=1
        continue

I would be grateful if correct me what I am doing wrong and please let me know if there is a better way to specify and drop the rows or columns with null values and categorical variables.如果纠正我做错了什么,我将不胜感激,如果有更好的方法来指定和删除具有空值和分类变量的行或列,请告诉我。

Thank you.谢谢你。

So, it seems you're using pandas even if the code does not look really pythonic.因此,即使代码看起来不是真正的 Pythonic,您似乎也在使用 Pandas。 Anyway, I would suggest to not iterate though each row of the dataframe, in pandas rows containing nan can be dropped using dropna :无论如何,我建议不要遍历数据帧的每一行,在包含 nan 的熊猫行中可以使用dropna 删除

 merged_file.dropna(subset=['Order ID'],inplace=True)

To remove the rows containing categorical variables instead you can use numpy isreal .要删除包含分类变量的行,您可以使用 numpy isreal Apply simply apply the function isreal to all rows, labelling as False all rows which do not contain numerical values. Apply 简单地将函数 isreal 应用于所有行,将所有不包含数值的行标记为 False。

import numpy as np
merged_file = merged_file[merged_file['Order ID'].apply(lambda x: np.isreal(x))]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM