简体   繁体   中英

Removing empty rows from dataframe

I have a dataframe with empty values in rows

在此处输入图像描述

How can I remove these empty values? I have already tried data.replace('', np.nan, inplace=True) and data.dropna() but that didn't change anything. What other ways are there to drop empty rows from a dataframe?

Try with

data = data.replace('', np.nan).dropna()

Update

data = data.apply(pd.to_numeric,errors='coerce').dropna()

As you have spaces in a numeric variable, I'm assuming it got read in as a string. The way I would solve this in a robust way is following the following:

data = {'lattitude': ['', '38.895118', '', '', '', '45.5234515', '', '40.764462'],
        'longitude': ['', '-77.0363658', '', '', '', '-122.6762071', '', '-11.904565']}
df = pd.DataFrame(data)

在此处输入图像描述

Change the fields to a numeric field. errors='coerce' will change the values it can not convert to a numeric to pd.NaN.

df = df.apply(lambda x: pd.to_numeric(x, errors='coerce'))

在此处输入图像描述

The only thing you'll have to do now is drop the NA's

df.dropna(inplace=True)

在此处输入图像描述

Another possible solution is to use regular expressions. In this case it's a negative match to any character. So if the field does not have a character, it'll be caught here. Of course there are multiple regex possible here.

mask = (df['lattitude'].str.contains(r'(^\S)') & df['longitude'].str.contains(r'(^\S)'))
df = df[mask]

suppose latitude is between -90 and 90.

data = data[data['latitude'] <= 90]

this should work, no matter they are Nan or ''

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM