[英]Python: How to check if there is no value in a csv column using a list
I have a CSV file and I want to check for each row if it has one or more values in different columns which I specified in a list.我有一个 CSV 文件,我想检查每一行是否在我在列表中指定的不同列中有一个或多个值。 If there is no value in any column it should add up to a counter so I know how many rows are empty.如果任何列中都没有值,它应该加起来为一个计数器,这样我就知道有多少行是空的。 But if it has one value in one column from the list it shouldn't do anything.但是如果它在列表的一列中有一个值,它就不应该做任何事情。
The CSV file is like this: CSV 文件是这样的:
I made the code below but it is returning 0 which is not correct.我做了下面的代码,但它返回 0,这是不正确的。
import pandas as pd
testfile = 'test1.csv'
df = pd.read_csv(testfile)
column_names = ['Uniprot_acc',
'Uniprot_id',
'Interpro_domain',
'Ensembl_geneid',
'Ensembl_transcriptid',
'SIFT_score',
'SIFT_pred']
counter = 0
for row in df:
for column_name in column_names:
if column_name in row:
if column_name == None:
counter =+ 1
print(counter)
What I want to know is how many rows don't contain anything.我想知道的是有多少行不包含任何内容。 It should check per row for every column in the list if there is no value.如果没有值,它应该检查列表中每一列的每一行。 And if indeed there is nothing in the row it should count.如果行中确实没有任何内容,它应该算数。 So in this example it should be 3.所以在这个例子中它应该是 3。
Use:用:
counter = df[column_names].isnull().all(axis=1).sum()
print (counter)
Sample :样品:
df = pd.DataFrame({
'A':list('abcdef'),
'Uniprot_acc':[np.nan,5,4,5,np.nan,4],
'Uniprot_id':[np.nan,8,9,4,np.nan,np.nan],
'Interpro_domain':[np.nan,3,np.nan,7,np.nan,0],
'E':[5,3,np.nan,9,np.nan,4],
})
column_names = ['Uniprot_acc',
'Uniprot_id',
'Interpro_domain']
print (df)
A Uniprot_acc Uniprot_id Interpro_domain E
0 a NaN NaN NaN 5.0
1 b 5.0 8.0 3.0 3.0
2 c 4.0 9.0 NaN NaN
3 d 5.0 4.0 7.0 9.0
4 e NaN NaN NaN NaN
5 f 4.0 NaN 0.0 4.0
counter = df[column_names].isnull().all(axis=1).sum()
print (counter)
2
Explanation :说明:
First filter columns by list:首先按列表过滤列:
print (df[column_names])
Uniprot_acc Uniprot_id Interpro_domain
0 NaN NaN NaN
1 5.0 8.0 3.0
2 4.0 9.0 NaN
3 5.0 4.0 7.0
4 NaN NaN NaN
5 4.0 NaN 0.0
Then check missing values None
and NaN
s:然后检查缺失值None
和NaN
s:
print (df[column_names].isnull())
Uniprot_acc Uniprot_id Interpro_domain
0 True True True
1 False False False
2 False False True
3 False False False
4 True True True
5 False True False
Check if all Trues per rows by DataFrame.all
:通过DataFrame.all
检查每行是否所有 Trues :
print (df[column_names].isnull().all(axis=1))
0 True
1 False
2 False
3 False
4 True
5 False
dtype: bool
And last count only True
s by sum
.最后只通过sum
计算True
s。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.