简体   繁体   中英

How to Write a plt.scatter(x, y) function in one line where y=function of x

I was plotting a scatter plot to show null values in dataframe. As you can see the plt.scatter() function is not expressive enough. Relation between list(range(0,1200)) and 'a' is not clear unless you see the previous lines. Can the plt.scatter(x,y) be written in a more explicit way where it could be easily understood how x and y is related. Like if somebody only see the plt.scatter(x,y), they would understand what it is about.

a = []
for i in range(0,1200):
  feature_with_na = [feature for feature in df.columns if df[feature].isnull().sum()>i]
  a.append(len(feature_with_na))
plt.scatter(list(range(0,1200)), a)

On your x axis you have the number, then on the y-axis you want to plot the number of columns in your DataFrame that have more than that number of null values.

Instead of your loop you can count the number of null values within each column and use numpy.broadcasting , ( [:, None] ), to compare with an array of your numbers. This allows you to specify an xarr of the numbers, then you use that same array in the comparison.

Sample Data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plot

df = pd.DataFrame(np.random.choice([1,2,3,4,5,np.NaN], (100,10)))

Code

# Range of 'x' values to consider
xarr = np.arange(0, 100)

plt.scatter(xarr, (df.isnull().sum().to_numpy()>xarr[:, None]).sum(axis=1))

在此处输入图像描述

ALollz answer is good, but here's a less numpy-heavy alternative if that's your thing:

feature_null_counts = df.isnull().sum()
n_nulls = list(range(100))
features_with_n_nulls = [sum(feature_null_counts > n) for n in n_nulls]
plt.scatter(n_nulls, features_with_n_nulls)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM