简体   繁体   English

Python:如何使用列表检查csv列中是否没有值

[英]Python: How to check if there is no value in a csv column using a list

I have a CSV file and I want to check for each row if it has one or more values in different columns which I specified in a list.我有一个 CSV 文件,我想检查每一行是否在我在列表中指定的不同列中有一个或多个值。 If there is no value in any column it should add up to a counter so I know how many rows are empty.如果任何列中都没有值,它应该加起来为一个计数器,这样我就知道有多少行是空的。 But if it has one value in one column from the list it shouldn't do anything.但是如果它在列表的一列中有一个值,它就不应该做任何事情。

The CSV file is like this: CSV 文件是这样的:.csv 文件

I made the code below but it is returning 0 which is not correct.我做了下面的代码,但它返回 0,这是不正确的。

import pandas as pd

testfile = 'test1.csv'

df = pd.read_csv(testfile)

column_names = ['Uniprot_acc',
'Uniprot_id',
'Interpro_domain',
'Ensembl_geneid',
'Ensembl_transcriptid',
'SIFT_score',
'SIFT_pred']

counter = 0

for row in df:
    for column_name in column_names:
        if column_name in row:
            if column_name == None:
                counter =+ 1

print(counter)

What I want to know is how many rows don't contain anything.我想知道的是有多少行不包含任何内容。 It should check per row for every column in the list if there is no value.如果没有值,它应该检查列表中每一列的每一行。 And if indeed there is nothing in the row it should count.如果行中确实没有任何内容,它应该算数。 So in this example it should be 3.所以在这个例子中它应该是 3。

Use:用:

counter = df[column_names].isnull().all(axis=1).sum()
print (counter)

Sample :样品

df = pd.DataFrame({
         'A':list('abcdef'),
         'Uniprot_acc':[np.nan,5,4,5,np.nan,4],
         'Uniprot_id':[np.nan,8,9,4,np.nan,np.nan],
         'Interpro_domain':[np.nan,3,np.nan,7,np.nan,0],
         'E':[5,3,np.nan,9,np.nan,4],

})

column_names = ['Uniprot_acc',
                'Uniprot_id',
                'Interpro_domain']

print (df)
   A  Uniprot_acc  Uniprot_id  Interpro_domain    E
0  a          NaN         NaN              NaN  5.0
1  b          5.0         8.0              3.0  3.0
2  c          4.0         9.0              NaN  NaN
3  d          5.0         4.0              7.0  9.0
4  e          NaN         NaN              NaN  NaN
5  f          4.0         NaN              0.0  4.0

counter = df[column_names].isnull().all(axis=1).sum()
print (counter)
2

Explanation :说明

First filter columns by list:首先按列表过滤列:

print (df[column_names])
   Uniprot_acc  Uniprot_id  Interpro_domain
0          NaN         NaN              NaN
1          5.0         8.0              3.0
2          4.0         9.0              NaN
3          5.0         4.0              7.0
4          NaN         NaN              NaN
5          4.0         NaN              0.0

Then check missing values None and NaN s:然后检查缺失值NoneNaN s:

print (df[column_names].isnull())
   Uniprot_acc  Uniprot_id  Interpro_domain
0         True        True             True
1        False       False            False
2        False       False             True
3        False       False            False
4         True        True             True
5        False        True            False

Check if all Trues per rows by DataFrame.all :通过DataFrame.all检查每行是否所有 Trues :

print (df[column_names].isnull().all(axis=1))
0     True
1    False
2    False
3    False
4     True
5    False
dtype: bool

And last count only True s by sum .最后只通过sum计算True s。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM