简体   繁体   中英

How to use a comparison statement on a for loop iterator in a conditional statement?

I'm iterating over a large (300+ columns & 1 000 000+ rows) .txt file (tab delimited). file format:

species 1    ...    sample1(11th col)    sample2    ....    sampleN(353th col)
species 2    ...    6046                 5364               ....
species 3    ...    15422                0                  ....

Each row is a species and from column 11 onward each column is a sample. For each sample I want to know how many species in that sample have a value of greater than 0. So what I do is iterate over each line, see for which samples the value is greater than 0, and if so add a 1. So for each sample the total sum of 1s is the total amount of rows that have a value greater than 0.

For that I use following code:

samples = []
OTUnumber = []

with open('all.16S.uniq.txt','r') as file:
     for i,line in enumerate(file): 
        columns = line.strip().split('\t')[11:353] 
        if i == 0: #headers are sample names so first row
            samples = columns #save sample names 
            OTUnumbers = [0 for s in samples] #set starting value as zero
        else:
            for n,v in enumerate(columns):
                if v > 0:
                    OTUnumber[n] = OTUnumber[n] + 1
                else:
                    continue

result = dict(zip(samples,OTUnumbers))

When I run thise code I get following error: TypeError: '>' not supported between instances of 'str' and 'int' This error is raised by if v > 0 . Why can't I write this statement?

So if v of columns [n] > 0 I want to add 1 to OTUnumber at that index. If v <0 I want to skip that row and do not add 1 (or add 0).

How can I make this code work?

When I run thise code I get following error: TypeError: '>' not supported between instances of 'str' and 'int' This error is raised by if v > 0 . Why can't I write this statement?

As the error says, you are trying to use the comparison operator > on a string and an int, which is not allowed. v is a string, not an integer. Presumably you want to use int(v) > 0 rather than v > 0 , or do the following to begin with.

columns = [int(v) for v in line.strip().split('\t')[11:353]] 

try this:

samples = []
OTUnumbers = []

with open('all.16S.uniq.txt','r') as file:
     for i,line in enumerate(file): 
        columns = line.strip().split('\t')[11:353] 
        if i == 0: #headers are sample names so first row
            samples = columns #save sample names 
            OTUnumbers = [0 for s in samples] #set starting value as zero
        else:
            for n,v in enumerate(columns):
                if int(v) > 0:
                    OTUnumbers[n] = OTUnumbers[n] + 1
                else:
                    continue

result = dict(zip(samples,OTUnumbers))

that's basically 2 fixes:

  • casting v to int
  • renaming OTUnumber to OTUnumbers in all the code

So the thing is that in your text file there are records which are strings and your code is trying to compare an integer to a string which throws a TypeError exception

To make the code work you can convert your record to int before comparing ie, int(v) > 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM