![](/img/trans.png)
[英]PySpark write a function to count non zero values of given columns
[英]how to write a function to find Zero value count and % of zero for all the columns and export into excel in python
我有一個數據集,它有 780 列和 87529 行,它包含許多零值。 我正在使用下面的代碼,但是我得到一個 780*2 的行,這真的很難閱讀和理解,所以我想把這個結果導出到 excel,誰能幫我構建代碼。
for column_name in df.columns:
column = df[column_name]
count = (column == 0).sum()
percent_zero = (column ==0 ).sum()/87529*100
print('Count of zeros in column ', column_name, ' is : ', count)
試試這個。 (您必須使用自己的df
)
import pandas as pd
# Use your own dataframe.
df = pd.DataFrame([
{'col1': 0, 'col2': 0},
{'col1': 1, 'col2': 0},
{'col1': 1, 'col2': 1},
])
temp = 'Count of zeros in column "{col}" is : {n_zeros} (Percentage: {percent_zero:.1f}%)'
n_rows = len(df)
seeds = []
for col, ser in df.iteritems():
n_zeros = (ser == 0).sum()
percent_zero = n_zeros / n_rows * 100
print(temp.format(col=col, n_zeros=n_zeros, percent_zero=percent_zero))
seeds.append({'column_name': col, 'number_of_zero': n_zeros, 'percent_of_zero': percent_zero})
df_out = pd.DataFrame(seeds)
df_out.to_excel('out.xlsx', index=False)
如果您遇到與導出相關的錯誤,請嘗試以下命令:
pip install openpyxl
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.