I'm creating a script to read a csv file into a set of named tuples from their column headers. I will then use these namedtuples to pull out rows of data which meet certain criteria.
I've worked out the input (shown below), but am having issues with filtering the data before outputting it to another file.
import csv
from collections import namedtuple
with open('test_data.csv') as f:
f_csv = csv.reader(f) #read using csv.reader()
Base = namedtuple('Base', next(f_csv)) #create namedtuple keys from header row
for r in f_csv: #for each row in the file
row = Base(*r)
# Process row
print(row) #print data
The contents of my input file are as follows:
Locus Total_Depth Average_Depth_sample Depth_for_17
chr1:6484996 1030 1030 1030
chr1:6484997 14 14 14
chr1:6484998 0 0 0
And they are printed from my code as follow:
Base(Locus='chr1:6484996', Total_Depth='1030', Average_Depth_sample='1030', Depth_for_17='1030') Base(Locus='chr1:6484997', Total_Depth='14', Average_Depth_sample='14', Depth_for_17='14') Base(Locus='chr1:6484998', Total_Depth='0', Average_Depth_sample='0', Depth_for_17='0')
I want to be able to pull out only the records with a Total_Depth greater than 15.
Intuitively I tried the following function:
if Base.Total_Depth >= 15 :
print row
However this only prints the final row of data (from the above output table). I think the problem is twofold. As far as I can tell I'm not storing my named tuples anywhere for them to be referenced later. And secondly the numbers are being read in string format rather than as integers.
Firstly can someone correct me if I need to store my namedtuples somewhere.
And secondly how do I convert the string values to integers? Or is this not possible because namedtuples are immutable.
Thanks!
I previously asked a similar question with respect to dictionaries, but now would like to use namedtuples instead. :)
Map your values to int
when creating the named tuple instances:
row = Base(r[0], *map(int, r[1:]))
This keeps the r[0]
value as a string, and maps the remaining values to int()
.
This does require knowledge of the CSV columns as which ones can be converted to integer is hardcoded here.
Demo:
>>> from collections import namedtuple
>>> Base = namedtuple('Base', ['Locus', 'Total_Depth', 'Average_Depth_sample', 'Depth_for_17'])
>>> r = ['chr1:6484996', '1030', '1030', '1030']
>>> Base(r[0], *map(int, r[1:]))
Base(Locus='chr1:6484996', Total_Depth=1030, Average_Depth_sample=1030, Depth_for_17=1030)
Note that you should test against the rows, not the Base
class:
if row.Total_Depth >= 15:
within the loop, or in a new loop of collected rows.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.