简体   繁体   English

将命名元组的值从字符串转换为整数

[英]Converting values of named tuples from strings to integers

I'm creating a script to read a csv file into a set of named tuples from their column headers. 我正在创建一个脚本,将csv文件从列标题中读入一组命名元组。 I will then use these namedtuples to pull out rows of data which meet certain criteria. 然后,我将使用这些命名元素来提取符合特定条件的数据行。

I've worked out the input (shown below), but am having issues with filtering the data before outputting it to another file. 我已经计算出输入(如下所示),但是在将数据输出到另一个文件之前过滤数据时遇到了问题。

import csv
from collections import namedtuple

with open('test_data.csv') as f:
    f_csv = csv.reader(f) #read using csv.reader()
    Base = namedtuple('Base', next(f_csv)) #create namedtuple keys from header row
    for r in f_csv: #for each row in the file
        row = Base(*r) 
        # Process row
        print(row) #print data

The contents of my input file are as follows: 我输入文件的内容如下:

Locus           Total_Depth     Average_Depth_sample    Depth_for_17
chr1:6484996    1030            1030                    1030
chr1:6484997    14              14                      14
chr1:6484998    0               0                       0

And they are printed from my code as follow: 它们是从我的代码中打印出来的,如下所示:

Base(Locus='chr1:6484996', Total_Depth='1030', Average_Depth_sample='1030', Depth_for_17='1030') Base(Locus='chr1:6484997', Total_Depth='14', Average_Depth_sample='14', Depth_for_17='14') Base(Locus='chr1:6484998', Total_Depth='0', Average_Depth_sample='0', Depth_for_17='0') 基数(Locus ='chr1:6484996',Total_Depth ='1030',Average_Depth_sample ='1030',Depth_for_17 ='1030')基数(Locus ='chr1:6484997',Total_Depth ='14',Average_Depth_sample ='14', Depth_for_17 = '14')基数(Locus ='chr1:6484998',Total_Depth ='0',Average_Depth_sample ='0',Depth_for_17 ='0')

I want to be able to pull out only the records with a Total_Depth greater than 15. 我希望能够仅提取Total_Depth大于15的记录。

Intuitively I tried the following function: 直觉我尝试了以下功能:

if Base.Total_Depth >= 15 :
    print row

However this only prints the final row of data (from the above output table). 但是,这仅打印最后一行数据(来自上面的输出表)。 I think the problem is twofold. 我认为问题是双重的。 As far as I can tell I'm not storing my named tuples anywhere for them to be referenced later. 据我所知,我没有将我的命名元组存储在任何地方,以便稍后引用它们。 And secondly the numbers are being read in string format rather than as integers. 其次,数字是以字符串格式而不是整数读取的。

Firstly can someone correct me if I need to store my namedtuples somewhere. 首先,如果我需要将我的命名元素存储在某个地方,有人可以纠正我。

And secondly how do I convert the string values to integers? 其次,如何将字符串值转换为整数? Or is this not possible because namedtuples are immutable. 或者这是不可能的,因为namedtuples是不可变的。

Thanks! 谢谢!

I previously asked a similar question with respect to dictionaries, but now would like to use namedtuples instead. 以前曾就字典问过一个类似的问题 ,但现在想要使用namedtuples。 :) :)

Map your values to int when creating the named tuple instances: 在创建命名元组实例时将值映射到int

row = Base(r[0], *map(int, r[1:])) 

This keeps the r[0] value as a string, and maps the remaining values to int() . 这将r[0]值保持为字符串,并将其余值映射到int()

This does require knowledge of the CSV columns as which ones can be converted to integer is hardcoded here. 确实需要的CSV列,哪些可以转换为整数,这里硬编码的知识。

Demo: 演示:

>>> from collections import namedtuple
>>> Base = namedtuple('Base', ['Locus', 'Total_Depth', 'Average_Depth_sample', 'Depth_for_17'])
>>> r = ['chr1:6484996', '1030', '1030', '1030']
>>> Base(r[0], *map(int, r[1:]))
Base(Locus='chr1:6484996', Total_Depth=1030, Average_Depth_sample=1030, Depth_for_17=1030)

Note that you should test against the rows, not the Base class: 请注意,您应该测试行,而不是Base类:

if row.Total_Depth >= 15:

within the loop, or in a new loop of collected rows. 在循环内,或在收集的行的新循环中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM