简体   繁体   English

解析数组python中的文本文件

[英]Parse a text file in an array python

    A   C   G   T
A   2   -1  -1  -1  
C   -1  2   -1  -1
G   -1  -1  2   -1
T   -1  -1  -1  2

This file is separated by tabs as a text file and I want it to be mapped in a similar format to in python. 此文件由制表符分隔为文本文件,我希望它以与python类似的格式映射。

{'A': {'A': 91, 'C': -114, 'G': -31, 'T': -123},
    'C': {'A': -114, 'C':  100, 'G': -125, 'T': -31},
    'G': {'A': -31, 'C': -125, 'G': 100, 'T':  -114},
    'T': {'A': -123, 'C': -31, 'G':  -114, 'T':  91}}

I have tried very had but I cannot figure out how to do this as I am new to python. 我已经尝试了很多,但我无法弄清楚如何做到这一点,因为我是python的新手。

Please help. 请帮忙。

My code so far: 我的代码到目前为止:

seq = flines[0]
    newseq = []
    j = 0
    while(l < 4):
        i = 2
        while(o < 4):
            newseq[i][j] = seqLine[i]
            i = i + 1;
            o = o + 1
        j = j + 1
        l = l + 1
    print (seq)
    print(seqLine)

I think this is what you want: 我想这就是你想要的:

import csv

data = {}

with open('myfile.csv', 'rb') as csvfile:
    ntreader = csv.reader(csvfile, delimiter="\t", quotechar='"')
    for rowI, rowData in enumerate(ntreader):
        if rowI == 0:
            headers = rowData[1:]
        else:
            data[rowData[0]] = {k: int(v) for k, v in zip(headers, rowData[1:])}


print data

To make life easy I use csv-module and just say tab is delimiter, then I grab the column headers on the first row and use them for all other rows to label the values. 为了简化生活,我使用csv-module,然后说tab是分隔符,然后我抓住第一行的列标题,并将它们用于所有其他行来标记值。

This produces: 这会产生:

{'A ': {'A': '2', 'C': '-1', 'T': '-1  ', 'G': '-1'}, 
 'C': {'A': '-1', 'C': '2', 'T': '-1', 'G': '-1'},
 'T': {'A': '-1', 'C': '-1', 'T': '2', 'G': '-1'},
 'G': {'A': '-1', 'C': '-1', 'T': '-1', 'G': '2'}}

Edit* 编辑*

For python <2.7 it should work if you switch the dictionary comprehension line ( rowData[0]] = .... ) above and use a simple loop in the same place: 对于python <2.7,如果你切换上面的字典理解行( rowData[0]] = .... )并在同一个地方使用一个简单的循环它应该工作:

    rowDict = dict()
    for k, v in zip(headers, rowData[1:]):
        rowDict[k] = int(v)
    data[rowData[0]] = rowDict

Using csv.DictReader gets you most of the way there on your own: 使用csv.DictReader可以自己获取大部分内容:

reader = DictReader('file.csv', delimiter='\t')
#dictdata = {row['']: row for row in reader}       # <-- python 2.7+ only
dictdata = dict((row[''], row) for row in reader)  # <-- python 2.6 safe

Outputs: 输出:

{'A': {None: [''], '': 'A', 'A': '2', 'C': '-1', 'G': '-1', 'T': '-1'},
 'C': {'': 'C', 'A': '-1', 'C': '2', 'G': '-1', 'T': '-1'},
 'G': {'': 'G', 'A': '-1', 'C': '-1', 'G': '2', 'T': '-1'},
 'T': {'': 'T', 'A': '-1', 'C': '-1', 'G': '-1', 'T': '2'}}

To clean up the extraneous keys got messy, and I needed to rebuild the inner dict , but replace the last line with this: 清理外来密钥变得混乱,我需要重建内部dict ,但用这个替换最后一行:

dictdata = {row['']: {key: value for key, value in row.iteritems() if key} for row in reader}

Outputs: 输出:

{'A': {'A': '2', 'C': '-1', 'G': '-1', 'T': '-1'},
 'C': {'A': '-1', 'C': '2', 'G': '-1', 'T': '-1'},
 'G': {'A': '-1', 'C': '-1', 'G': '2', 'T': '-1'},
 'T': {'A': '-1', 'C': '-1', 'G': '-1', 'T': '2'}}

Edit: for Python <2.7 编辑:对于Python <2.7

Dictionary comprehensions were added in 2.7. 字典理解在2.7中添加。 For 2.6 and lower, use the dict constructor: 对于2.6及更低版本,请使用dict构造函数:

dictdata = dict((row[''], dict((key, value) for key, value in row.iteritems() if key)) for row in reader)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM