简体   繁体   English

如何将字符附加到用作python字典键的字符串(当有多个与该字符串相关的条目时)?

[英]How to append characters to a string being used as a python dictionary key (when there are multiple entries related to that string)?

I am pulling out sequence coordinates from the output file produced by HMMER (finds DNA sequences, matching a query, in a genome assembly file). 我从HMMER生成的输出文件中提取序列坐标(找到DNA序列,在基因组装配文件中匹配查询)。

I create a python dictionary where the key is the source sequence name (a string), and the value is a list comprising the start and end coordinates of the target sequence. 我创建了一个python字典,其中键是源序列名称(字符串),值是包含目标序列的起始和结束坐标的列表。 However, HMMER often finds multiple matches on a single source sequence (contig/chromosome). 然而,HMMER经常在单个源序列(重叠群/染色体)上发现多个匹配。

This means that as I add to the dictionary, if I come across multiple matches on a contig, each is overwritten by the following match. 这意味着当我添加到字典中时,如果我遇到重叠群上的多个匹配项,则每个匹配项都会被覆盖。

Eg HMMER finds the following matches: 例如,HMMER发现以下匹配:

Name Start End 名称开始结束

4415 16723 17556 4415 16723 17556

127 1290 1145 127 1290 1145

1263 34900 37834 1263 34900 37834

4415 2073 3899 4415 2073 3899

4415 4580 6004 4415 4580 6004

But this results in the following dictionary (I want separate entries for each match): 但这会产生以下字典(我希望每个匹配单独输入):

{'127': ['1290', '1145'], '1263': ['34900', '37834'], '4415': ['4580', '6004']} {'127':['1290','1145'],'1263':['34900','37834'],'4415':['4580','6004']}

How can I append a letter to the key so that subsequent matches are unique and do not overwrite the previous ones, ie 4415, 4415a, 4415b, and so on? 如何在密钥上附加一个字母,以便后续匹配是唯一的,不会覆盖以前的匹配,即4415,4415a,4415b等等?

matches = {}

for each line of HMMER file:
    split the line
    make a list of fields 4 & 5 (the coordinates)
    # at this stage I need a way of checking whether the key (sequenceName)
    # is already in the dictionary (easy), and if it is, appending a letter
    # to sequenceName to make it unique
    matches[sequenceName] = list

It's not a proper way to go to create different keys while the are equal, instead you can use a list for your values and preserve the coordinates in it, for duplicate keys. 在相同的情况下,创建不同的键不是一种正确的方法,而是可以使用列表作为值并保留其中的坐标,以用于重复键。 You can use collections.defaultdict() for this aim: 您可以使用collections.defaultdict()实现此目标:

>>> coords = [['4415', '16723', '17556'], ['127', '1290', '1145'], ['1263', '34900', '37834'], ['4415', '2073', '3899'], ['4415', '4580', '6004']]
>>> from collections import defaultdict
>>> 
>>> d = defaultdict(list)
>>> 
>>> for i, j, k in coords:
...     d[i].append((j, k))
... 
>>> d
defaultdict(<type 'list'>, {'1263': [('34900', '37834')], '4415': [('16723', '17556'), ('2073', '3899'), ('4580', '6004')], '127': [('1290', '1145')]})

Besides, the idea of adding a character at the end of the keys in not optimum, because you need to have the count of keys always and you are not aware of this number so you have to generate new suffix. 此外,在键的末尾添加字符的想法并不是最佳的,因为您需要始终拥有键的数量而您不知道这个数字,因此您必须生成新的后缀。

But as an alternative if you only use the count of the keys you can create different ones by preserving the keys in a Counter() object and adding the count at the trailing of the key: 但是,如果您只使用键的计数,则可以通过保留Counter()对象中的键并在键的尾随处添加计数来创建不同的键:

>>> from collections import Counter
>>> d = {}
>>> c = Counter()
>>> for i, j, k in coords:
...     c.update((i,))
...     d["{}_{}".format(i, c[i])] = (j, k)
... 
>>> d
{'4415_1': ('16723', '17556'), '4415_3': ('4580', '6004'), '4415_2': ('2073', '3899'), '127_1': ('1290', '1145'), '1263_1': ('34900', '37834')}

You can do something like this: 你可以这样做:

matches = {'127': ['1290', '1145'], '1263': ['34900', '37834'], '4415': ['4580', '6004']}

# sample key_name
key_name = '4415'
if key_name in matches.keys():
    for i in xrange(1,26):
        if key_name+chr(ord('a') + i) not in matches.keys():
                matches[key_name+chr(ord('a') + i)] = #your value

This will increment your key_names as 4415a, 4415b... 这会将你的key_names增加为4415a,4415b ......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM