简体   繁体   English

打印 null 值的部分

[英]Print portion of null value

I am working with titanic dataset.我正在使用泰坦尼克号数据集。 I wonder how to show portion of null value from a train set.我想知道如何显示火车组中 null 值的一部分。

Here is my code: `这是我的代码:`

train_count_of_missval_by_col = (train.isnull().sum())
print('----- all columns along with count of missing value')
print(train_count_of_missval_by_col)
print('----only columns which has missing values----')
print(train_count_of_missval_by_col[train_count_of_missval_by_col>0])
print('----only columns which has missing data to total observations----')
print(train_count_of_missval_by_col[train_count_of_missval_by_col>0]/train.shape[])`

Unfortunately, the last line of the code generate error.不幸的是,代码的最后一行产生了错误。 What to add / edit on the lastline so the code will work?在最后一行添加/编辑什么以便代码可以工作?

I am not sure if there is a specific operation for this.我不确定是否有针对此的特定操作。 info() shows you the raw # and tells you the total rows but there are no parameters for the %. info()向您显示原始 # 并告诉您总行数,但 % 没有参数。 Also .info() returns as a None type object, so you can't access any data from that object. .info()也返回为None类型 object,因此您无法访问该 object 中的任何数据。

I would suggest looping through the column and returning the # null divided by total rows with df[col].isnull().sum() / df.shape[0] * 100 and printing out the output in a formatted string as such:我建议遍历该列并返回 # null 除以df[col].isnull().sum() / df.shape[0] * 100的总行数,然后以格式化字符串打印出 output,如下所示:

d = {'Col1': [np.nan, 6, np.nan, 2, np.nan],
     'Col2': [np.nan, 3, 5, np.nan, 9],
     'Col3': [2, 1, 8, np.nan, 9]}
df = pd.DataFrame(d)
for col in df.columns:
    print(col, f'{df[col].isnull().sum() / df.shape[0] * 100} % NULL')

Col1 60.0 % NULL
Col2 40.0 % NULL
Col3 20.0 % NULL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM