I'm iterating over a large (300+ columns & 1 000 000+ rows) .txt file (tab delimited). file format:
species 1 ... sample1(11th col) sample2 .... sampleN(353th col)
species 2 ... 6046 5364 ....
species 3 ... 15422 0 ....
Each row is a species and from column 11 onward each column is a sample. For each sample I want to know how many species in that sample have a value of greater than 0. So what I do is iterate over each line, see for which samples the value is greater than 0, and if so add a 1. So for each sample the total sum of 1s is the total amount of rows that have a value greater than 0.
For that I use following code:
samples = []
OTUnumber = []
with open('all.16S.uniq.txt','r') as file:
for i,line in enumerate(file):
columns = line.strip().split('\t')[11:353]
if i == 0: #headers are sample names so first row
samples = columns #save sample names
OTUnumbers = [0 for s in samples] #set starting value as zero
else:
for n,v in enumerate(columns):
if v > 0:
OTUnumber[n] = OTUnumber[n] + 1
else:
continue
result = dict(zip(samples,OTUnumbers))
When I run thise code I get following error: TypeError: '>' not supported between instances of 'str' and 'int'
This error is raised by if v > 0
. Why can't I write this statement?
So if v of columns [n] > 0 I want to add 1 to OTUnumber at that index. If v <0 I want to skip that row and do not add 1 (or add 0).
How can I make this code work?
When I run thise code I get following error:
TypeError: '>' not supported between instances of 'str' and 'int'
This error is raised by ifv > 0
. Why can't I write this statement?
As the error says, you are trying to use the comparison operator >
on a string and an int, which is not allowed. v
is a string, not an integer. Presumably you want to use int(v) > 0
rather than v > 0
, or do the following to begin with.
columns = [int(v) for v in line.strip().split('\t')[11:353]]
try this:
samples = []
OTUnumbers = []
with open('all.16S.uniq.txt','r') as file:
for i,line in enumerate(file):
columns = line.strip().split('\t')[11:353]
if i == 0: #headers are sample names so first row
samples = columns #save sample names
OTUnumbers = [0 for s in samples] #set starting value as zero
else:
for n,v in enumerate(columns):
if int(v) > 0:
OTUnumbers[n] = OTUnumbers[n] + 1
else:
continue
result = dict(zip(samples,OTUnumbers))
that's basically 2 fixes:
v
to int
OTUnumber
to OTUnumbers
in all the code So the thing is that in your text file there are records which are strings and your code is trying to compare an integer to a string which throws a TypeError exception
To make the code work you can convert your record to int before comparing ie, int(v) > 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.