简体   繁体   English

如何使用不同大小的列表创建 Python 字典?

[英]How do you create a python dictionary using lists of different size?

I have been learning to manipulate strings with regex, but have run into a problem formatting a dictionary with some data I am working with.我一直在学习使用正则表达式操作字符串,但是在使用我正在使用的一些数据格式化字典时遇到了问题。 Here is a simplified version of the code I am struggling with:这是我正在努力处理的代码的简化版本:

import re

line=">sp|A|PE=3 SV=1 IDMANTTI >sp|B|PE=3 SV=1 EVPFYPKA >sp|C| PE=3 SV=2 QRWLFNYSGNISN"

NGly_Sites=[]
protein_list=[]

p_and_a=re.findall(r'sp\|(\w+)\|.+?SV=\d\s([A-Z]+)', line) 
for protein, amino in p_and_a:
    print(protein, amino)
    protein_list.append(protein)
    NGly_Sites=re.findall(r'N[^P][ST][^P]', amino)
    print(NGly_Sites)
Sites={k:v for k,v in zip(protein_list, NGly_Sites)}
print(Sites)

And it prints:它打印:

A IDMANTTI
['NTTI']
B EVPFYPKA
[]
C QRWLFNYSGNISN
['NYSG', 'NISN']
{'A': 'NYSG', 'B': 'NISN'

I am trying to match up items I have named "protein" with the resulting sequences I have found using the .findall() function in python.我正在尝试将我命名为“蛋白质”的项目与我使用 python 中的 .findall() 函数找到的结果序列进行匹配。 Essentially I am wanting to do the following:基本上我想做以下事情:

{'A':['NTTI'],'C':['NYSG','NISN']}

I do not understand why the objects found using the .findall() function that are being placed into the dictionary are being done so under all the keys ('A', 'B', 'C') rather then their specific key or why I can't seem to attach a list of the objects found using .findall() under one key.我不明白为什么使用 .findall() 函数找到的被放入字典的对象是在所有键('A'、'B'、'C')下完成的,而不是它们的特定键或为什么我似乎无法在一个键下附加使用 .findall() 找到的对象列表。 I'm sure this is just something to do with syntax, but I've experimented with {k:v for k,v in zip(list1,list2)} which was how I was told to make a dictionary with two lists, and I can't seem to figure out how to get it to insert a list within a list.我确定这只是与语法有关,但我已经尝试过 {k:v for k,v in zip(list1,list2)} 这就是我被告知要制作包含两个列表的字典的方式,并且我似乎无法弄清楚如何让它在列表中插入一个列表。 How can I go about doing this?我该怎么做呢?

You can use a list comprehension to build a complete protein list of tuple pairs, then a dict comprehension to filter out empty list values.您可以使用列表理解来构建元组对的完整蛋白质列表,然后使用字典理解来过滤掉空列表值。 This could be done in a single dict comprehension, but breaking it into two steps is a little bit clearer and saves an awkward extra call to findall for extracting the protein sequences.这可以在单个 dict 理解中完成,但将其分为两步会更清晰一点,并且可以节省对findall进行笨拙的额外调用以提取蛋白质序列。

import re

line = ">sp|A|PE=3 SV=1 IDMANTTI >sp|B|PE=3 SV=1 EVPFYPKA >sp|C| PE=3 SV=2 QRWLFNYSGNISN"
protein_pattern = r"sp\|(\w+)\|.+?SV=\d\s([A-Z]+)"
sites_pattern = r"N[^P][ST][^P]"

all_proteins = [
    (k, re.findall(sites_pattern, v)) 
    for k, v in re.findall(protein_pattern, line)
]
sites = {k: v for k, v in all_proteins if v}

print(sites) # => {'A': ['NTTI'], 'C': ['NYSG', 'NISN']}
import re

line=">sp|A|PE=3 SV=1 IDMANTTI >sp|B|PE=3 SV=1 EVPFYPKA >sp|C| PE=3 SV=2 QRWLFNYSGNISN"

p_and_a=re.findall(r'sp\|(\w+)\|.+?SV=\d\s([A-Z]+)', line) 

sites =  { protein : re.findall(r'N[^P][ST][^P]', amino)  for protein, amino in p_and_a }

print(sites)

# {'A': ['NTTI'], 'B': [], 'C': ['NYSG', 'NISN']}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM