[英]Print portion of null value
I am working with titanic dataset.我正在使用泰坦尼克号数据集。 I wonder how to show portion of null value from a train set.我想知道如何显示火车组中 null 值的一部分。
Here is my code: `这是我的代码:`
train_count_of_missval_by_col = (train.isnull().sum())
print('----- all columns along with count of missing value')
print(train_count_of_missval_by_col)
print('----only columns which has missing values----')
print(train_count_of_missval_by_col[train_count_of_missval_by_col>0])
print('----only columns which has missing data to total observations----')
print(train_count_of_missval_by_col[train_count_of_missval_by_col>0]/train.shape[])`
Unfortunately, the last line of the code generate error.不幸的是,代码的最后一行产生了错误。 What to add / edit on the lastline so the code will work?在最后一行添加/编辑什么以便代码可以工作?
I am not sure if there is a specific operation for this.我不确定是否有针对此的特定操作。 info()
shows you the raw # and tells you the total rows but there are no parameters for the %. info()
向您显示原始 # 并告诉您总行数,但 % 没有参数。 Also .info()
returns as a None
type object, so you can't access any data from that object. .info()
也返回为None
类型 object,因此您无法访问该 object 中的任何数据。
I would suggest looping through the column and returning the # null divided by total rows with df[col].isnull().sum() / df.shape[0] * 100
and printing out the output in a formatted string as such:我建议遍历该列并返回 # null 除以df[col].isnull().sum() / df.shape[0] * 100
的总行数,然后以格式化字符串打印出 output,如下所示:
d = {'Col1': [np.nan, 6, np.nan, 2, np.nan],
'Col2': [np.nan, 3, 5, np.nan, 9],
'Col3': [2, 1, 8, np.nan, 9]}
df = pd.DataFrame(d)
for col in df.columns:
print(col, f'{df[col].isnull().sum() / df.shape[0] * 100} % NULL')
Col1 60.0 % NULL
Col2 40.0 % NULL
Col3 20.0 % NULL
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.