How to Write a plt.scatter(x, y) function in one line where y=function of x

Question

I was plotting a scatter plot to show null values in dataframe. As you can see the plt.scatter() function is not expressive enough. Relation between list(range(0,1200)) and 'a' is not clear unless you see the previous lines. Can the plt.scatter(x,y) be written in a more explicit way where it could be easily understood how x and y is related. Like if somebody only see the plt.scatter(x,y), they would understand what it is about.

a = []
for i in range(0,1200):
  feature_with_na = [feature for feature in df.columns if df[feature].isnull().sum()>i]
  a.append(len(feature_with_na))
plt.scatter(list(range(0,1200)), a)

Answer 1

On your x axis you have the number, then on the y-axis you want to plot the number of columns in your DataFrame that have more than that number of null values.

Instead of your loop you can count the number of null values within each column and use numpy.broadcasting , ( [:, None] ), to compare with an array of your numbers. This allows you to specify an xarr of the numbers, then you use that same array in the comparison.

Sample Data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plot

df = pd.DataFrame(np.random.choice([1,2,3,4,5,np.NaN], (100,10)))

Code

# Range of 'x' values to consider
xarr = np.arange(0, 100)

plt.scatter(xarr, (df.isnull().sum().to_numpy()>xarr[:, None]).sum(axis=1))

Answer 2

ALollz answer is good, but here's a less numpy-heavy alternative if that's your thing:

feature_null_counts = df.isnull().sum()
n_nulls = list(range(100))
features_with_n_nulls = [sum(feature_null_counts > n) for n in n_nulls]
plt.scatter(n_nulls, features_with_n_nulls)

How to Write a plt.scatter(x, y) function in one line where y=function of x

Question

2 answers

solution1
0 ACCPTED 2021-05-13 16:29:40

Sample Data

Code

solution2
0 2021-05-13 16:42:46

How to Write a plt.scatter(x, y) function in one line where y=function of x

Question

2 answers

solution1 0 ACCPTED 2021-05-13 16:29:40

Sample Data

Code

solution2 0 2021-05-13 16:42:46

solution1
0 ACCPTED 2021-05-13 16:29:40

solution2
0 2021-05-13 16:42:46