简体   繁体   English

如何检查 python 中非空列的约束?

[英]How to check constraint of non-null column in python?

df1: df1:

   ColumnName   Nullable
0  name         True
1  Desgn        True
2  Emp_number   False
3  Salary       True

df2: df2:

   name     Desgn     Emp_number  Salary
0  krul                125796    45000
1  arnold   lawyer     789632    25000
2  daisy    engg       256498    
3  alex                456985    65884
4  mandy    arch       456258    36958
5  krul     painter    
6  perry               789632 
7  timu     lawyer     
8  timy     lawyer     789632    69822
9  daisy    engg       
10 daisy    engg       256498    54869

How to check the number of missing Values in df2 for Null-able Columns (nullable == True), if non-nullable column has missing value raise error else replace with median or mode?如何检查 df2 中可空列的缺失值数量(可空 == 真),如果不可空列有缺失值引发错误,否则替换为中位数或众数?

for idx, row in df1.iterrows():
    if not row["Nullable"]:
        # Get all the rows in df2 which has that column as null
        nulls = df2[df2[row["ColumnName"]].isnull()]

        # No of rows that has the column null
        print(len(nulls))

Without for loops:没有 for 循环:

import pandas as pd
from io import StringIO

df2 = pd.read_table(StringIO("""   name     Desgn     Emp_number  Salary
0  krul     nan           125796    45000
1  arnold   lawyer     789632    25000
2  daisy    engg       256498    nan
3  alex      nan          456985    65884
4  mandy    arch       456258    36958
5  krul     painter    nan       nan
6  perry      nan         789632    nan
7  timu     lawyer     nan     nan
8  timy     lawyer     789632    69822
9  daisy    engg       nan       nan
10 daisy    engg       256498    54869"""), sep='\s+')

df1 = pd.read_table(StringIO("""   ColumnName   Nullable
0  name         True
1  Desgn        True
2  Emp_number   False
3  Salary       True"""), sep='\s+')


# Transpose switches dtype, so we need to know what they were originally
a = df2.T.loc[df1.loc[df1.Nullable==True, 'ColumnName']].T
a = a.astype(df2[a.columns].dtypes.to_dict())

# Replace with median
df2[a.columns] = a.fillna(a.median())

# If any null in non nullable, raise ValueError
non_nullable_has_null = df2.T.loc[df1.loc[df1.Nullable==False, 'ColumnName']].T.isnull().any().any()
if non_nullable_has_null:
    raise ValueError('non nullable has a null')

You can create a new object and count the null values您可以创建一个新的 object 并计算 null 值

new_df = df2.replace(to_replace=[None, ''], value=pd.np.nan) 
new_df.isnull().sum() 

In [424]: df.isnull().sum()                                                                                                                                                                                 
Out[424]: 
name          0
Desgn         3
Emp_number    3
Salary        5
dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM