简体   繁体   中英

How can I calculate the percentage of empty values in a pandas dataframe?

I have a dataframe df , from which I know there are empty values, ie '' (blank spaces). I want to calculate the percentage per column of those observations and replace them with NaN .

To get the percentage I've tried:

for col in df:
   empty = round((df[df[col]] == '').sum()/df.shape[0]*100, 1)

I have a similar code which calculates the zeros, which does work:

zeros = round((df[col] == 0).sum()/df.shape[0]*100, 1)

I think you need Series.isna for test missing values (but not empty spaces):

nans = round(df[col].isna().sum()/df.shape[0]*100, 1)

Solution should be simplify with mean :

nans = round(df[col].isna().mean()*100, 1)

For replace empty spaces or spaces to NaN s use:

df = df.replace(r'^\s*$', np.nan, regex=True)

nans = round(df[col].isna().mean()*100, 1)

If need test all columns:

nans = df.isna().mean().mul(100).round()

The full answer to your problem will be:

for col in df:
    empty_avg = round(df[col].isna().mean()*100, 1) # This line is to find the average of empty values.

df = df[df != ''] # This will replace all the empty values with NaN.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM