简体   繁体   中英

How to count missing & non-numeric data and then plot it with pandas

I have a dataset with more than 6k data. I want to know how to count missing data and non-numeric data(error) simultaneously, and then using a histogram to plot the occurrence.

I use this code to find out the missing data and error data but I can only filter one subset each time. I don't know how to sum them up. The data type of a, b, and c is the object. For Id and d are the int and float.

How can this be done programmatically? And then using the histogram to show the occurrence.

 df[pd.to_numeric(df['a'], errors='coerce').isnull()]

 df = pd.DataFrame({'Id':[1, 2, 3, 4, 5], 'a': [1, 2, good, 'bad', NaN], 'b': [0.1, worse, NaN, better, 0.5], 'c': ['2.5', 'best', '6.5', 'NaN', '10.5'], 'd': ['10', '20', '30', '40', '50']})

Setup

df = pd.DataFrame({'A' : ['', np.nan, 3], 'B' : ['amount', 5, 3]})

df_error = (pd.to_numeric(df.stack(dropna=False), errors='coerce')
              .isna()
              .map({True : 'error', False : 'not error'})
              .groupby(level=1)
              .value_counts()
              .unstack())
df_error.plot(kind='bar')

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM