How to count missing & non-numeric data and then plot it with pandas

Question

I have a dataset with more than 6k data. I want to know how to count missing data and non-numeric data(error) simultaneously, and then using a histogram to plot the occurrence.

I use this code to find out the missing data and error data but I can only filter one subset each time. I don't know how to sum them up. The data type of a, b, and c is the object. For Id and d are the int and float.

How can this be done programmatically? And then using the histogram to show the occurrence.

 df[pd.to_numeric(df['a'], errors='coerce').isnull()]

 df = pd.DataFrame({'Id':[1, 2, 3, 4, 5], 'a': [1, 2, good, 'bad', NaN], 'b': [0.1, worse, NaN, better, 0.5], 'c': ['2.5', 'best', '6.5', 'NaN', '10.5'], 'd': ['10', '20', '30', '40', '50']})

Answer 1

Setup

df = pd.DataFrame({'A' : ['', np.nan, 3], 'B' : ['amount', 5, 3]})

df_error = (pd.to_numeric(df.stack(dropna=False), errors='coerce')
              .isna()
              .map({True : 'error', False : 'not error'})
              .groupby(level=1)
              .value_counts()
              .unstack())
df_error.plot(kind='bar')

How to count missing & non-numeric data and then plot it with pandas

Question

1 answers

solution1
0 ACCPTED 2020-10-29 18:58:24

How to count missing & non-numeric data and then plot it with pandas

Question

1 answers

solution1 0 ACCPTED 2020-10-29 18:58:24

solution1
0 ACCPTED 2020-10-29 18:58:24