简体   繁体   中英

Python: How to check if there is no value in a csv column using a list

I have a CSV file and I want to check for each row if it has one or more values in different columns which I specified in a list. If there is no value in any column it should add up to a counter so I know how many rows are empty. But if it has one value in one column from the list it shouldn't do anything.

The CSV file is like this: .csv 文件

I made the code below but it is returning 0 which is not correct.

import pandas as pd

testfile = 'test1.csv'

df = pd.read_csv(testfile)

column_names = ['Uniprot_acc',
'Uniprot_id',
'Interpro_domain',
'Ensembl_geneid',
'Ensembl_transcriptid',
'SIFT_score',
'SIFT_pred']

counter = 0

for row in df:
    for column_name in column_names:
        if column_name in row:
            if column_name == None:
                counter =+ 1

print(counter)

What I want to know is how many rows don't contain anything. It should check per row for every column in the list if there is no value. And if indeed there is nothing in the row it should count. So in this example it should be 3.

Use:

counter = df[column_names].isnull().all(axis=1).sum()
print (counter)

Sample :

df = pd.DataFrame({
         'A':list('abcdef'),
         'Uniprot_acc':[np.nan,5,4,5,np.nan,4],
         'Uniprot_id':[np.nan,8,9,4,np.nan,np.nan],
         'Interpro_domain':[np.nan,3,np.nan,7,np.nan,0],
         'E':[5,3,np.nan,9,np.nan,4],

})

column_names = ['Uniprot_acc',
                'Uniprot_id',
                'Interpro_domain']

print (df)
   A  Uniprot_acc  Uniprot_id  Interpro_domain    E
0  a          NaN         NaN              NaN  5.0
1  b          5.0         8.0              3.0  3.0
2  c          4.0         9.0              NaN  NaN
3  d          5.0         4.0              7.0  9.0
4  e          NaN         NaN              NaN  NaN
5  f          4.0         NaN              0.0  4.0

counter = df[column_names].isnull().all(axis=1).sum()
print (counter)
2

Explanation :

First filter columns by list:

print (df[column_names])
   Uniprot_acc  Uniprot_id  Interpro_domain
0          NaN         NaN              NaN
1          5.0         8.0              3.0
2          4.0         9.0              NaN
3          5.0         4.0              7.0
4          NaN         NaN              NaN
5          4.0         NaN              0.0

Then check missing values None and NaN s:

print (df[column_names].isnull())
   Uniprot_acc  Uniprot_id  Interpro_domain
0         True        True             True
1        False       False            False
2        False       False             True
3        False       False            False
4         True        True             True
5        False        True            False

Check if all Trues per rows by DataFrame.all :

print (df[column_names].isnull().all(axis=1))
0     True
1    False
2    False
3    False
4     True
5    False
dtype: bool

And last count only True s by sum .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM