简体   繁体   English

将值累积到字典中的相应键?

[英]accumulating values to corresponding key in a dictionary?

I am trying to locate the positions of bases (A,C,G,T) and put them into a dictionary corresponding to their positions.我正在尝试定位碱基(A、C、G、T)的位置并将它们放入与其位置对应的字典中。

I am working from a text file that has lines of bases like below我正在使用一个文本文件,该文件具有如下所示的基数行

----T
C
-C
-----G
C
-----C
---T
----A
----C
-----G

From the information above, I know that从上面的信息,我知道

  • C is at the 1st position C 在第 1 个位置

  • C is at the 2nd position C 在第 2 位

  • 3rd position base is unknown第三位基数未知

  • T is at the 4th position T 在第 4 位

  • C, A, T are at the 5th position C、A、T 位于第 5 位

  • C, G are are at the 6th position C、G 在第 6 位

So far, I have written the code below到目前为止,我已经写了下面的代码

def chunks(chunks_file):
    set_bases = {}
    with open(chunks_file) as file:
        for line in file:
            for character in line:
                if character.isalpha():
                    letter = character
                    position = line.find(letter) + 1
                    set_bases[position] = {letter}

    return set_bases

my current output is:我目前的输出是:

{5: {'C'}, 1: {'C'}, 2: {'C'}, 6: {'G'}, 4: {'T'}}

where as the desired output would be :所需的输出是:

{1: {'C'}, 2: {'C'}, 4: {'T'}, 5: {'C', 'A', 'T'}, 6: {'C', 'G'}}

It seems to me that values are not being added to already existing keys, but the new values are replacing the old values.在我看来,值并未添加到现有的键中,但新值正在替换旧值。

How can I solve this problem?我怎么解决这个问题?

You can do it the following way, taking into consideration that you have a txt file:考虑到您有一个txt文件,您可以通过以下方式进行操作:

outDict = {}

with open('data.txt', 'r') as inFile:
    lines = [line.strip() for line in inFile if not line == '\n']
    outDict = dict((str(line.count('-')+1),set()) for line in lines)
    for line in lines:
        outDict[str(line.count('-')+1)].update(line[-1])
    print(outDict)

Result:结果:

{'5': {'C', 'A', 'T'}, '1': {'C'}, '2': {'C'}, '6': {'C', 'G'}, '4': {'T'}}

I can suggest the following improvements:我可以提出以下改进建议:

import collections

def chunks(filename):
    bases = collections.defaultdict(set)

    with open(filename) as f:
        for line in f:
            line = line.strip()
            if len(line) > 0:
                for i, char in enumerate(line):
                    if char.isalpha():
                        position = i + 1
                        bases[position].add(char)

    return bases
  • This code uses collections.defaultdict so you don't have to check if the position is present in the dict or not.此代码使用collections.defaultdict因此您不必检查该位置是否存在于 dict 中。
  • I also use enumerate() when iterating over the lines, so you already have the position and don't need to call line.find() .我也在遍历行时使用enumerate() ,所以你已经有了位置,不需要调用line.find()

This code can be used as follows:此代码可以按如下方式使用:

>>> d = chunks('your-file-name.txt')
>>> d
defaultdict(<class 'set'>, {5: {'T', 'C', 'A'}, 1: {'C'}, 2: {'C'}, 6: {'G', 'C'}, 4: {'T'}})

>>> dict(d)
{5: {'C', 'A', 'T'}, 1: {'C'}, 2: {'C'}, 6: {'G', 'C'}, 4: {'T'}}

>>> for k, v in sorted(d.items()):
...     print(k, v)
1 {'C'}
2 {'C'}
4 {'T'}
5 {'C', 'A', 'T'}
6 {'G', 'C'}

Try something like this:尝试这样的事情:

def chunks(chunks_file):
    set_bases = {}
    with open(chunks_file) as file:
        for line in file:
            for character in line:
                if character.isalpha():
                    letter = character
                    position = line.find(letter) + 1
                    if position in set_bases:
                        set_bases[position].append(letter)
                    else:
                        set_bases[position] = [letter]

    return set_bases

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM