简体   繁体   English

在python中使用相同的键合并两个列表字典

[英]merging two dictionaries of lists with the same keys in python

My problem: 我的问题:

I'm trying to merge two dictionaries of lists into a new dictionary, alternating the elements of the 2 original lists for each key to create the new list for that key. 我正在尝试将两个列表词典合并到一个新词典中,为每个键交替2个原始列表的元素,以创建该键的新列表。

So for example, if I have two dictionaries: 例如,如果我有两个词典:

strings = {'S1' : ["string0", "string1", "string2"], 'S2' : ["string0", "string1"]}

Ns = {'S1' : ["N0", "N1"], 'S2' : ["N0"]}

I want to merge these two dictionaries so that the final dictionary will look like: 我想合并这两个词典,以便最终字典看起来像:

strings_and_Ns = {'S1': ["string0", "N0", "string1", "N1", "string2"], 'S2': ["string0", "N0", "string1"]}

or better yet, have the strings from the list joined together for every key, like: 或者更好的是,让列表中的字符串为每个键连接在一起,例如:

strings_and_Ns = {'S1': ["string0N0string1N1string2"], 'S2': ["string0N0string1"]}

(I'm trying to connect together DNA sequence fragments.) (我正在尝试将DNA序列片段连接在一起。)

What I've tried so far: 到目前为止我尝试过的:

zip 压缩

 for S in Ns:   
     newsequence = [zip(strings[S], Ns[S])]
     newsequence_joined = ''.join(str(newsequence))
     strings_and_Ns[species] = newsequence_joined

This does not join the sequences together into a single string, and the order of the strings are still incorrect. 这不会将序列连接成一个字符串,并且字符串的顺序仍然不正确。

Using a defaultdict 使用defaultdict

from collections import defaultdict
strings_and_Ns = defaultdict(list)

    for S in (strings, Ns):
        for key, value in S.iteritems():
        strings_and_Ns[key].append(value)

The order of the strings for this is also incorrect... 这个字符串的顺序也是不正确的......

Somehow moving along the lists for each key... 以某种方式移动每个键的列表......

for S in strings: 
    list = strings[S]
    L = len(list)
    for i in range(L):
        strings_and_Ns[S] = strings_and_Ns[S] + strings[S][i] + strings[S][i]
strings_and_Ns = {}
for k,v in strings.items():
    pairs = zip(v, Ns[k] + ['']) # add empty to avoid need for zip_longest()
    flat = (item for sub in pairs for item in sub)
    strings_and_Ns[k] = ''.join(flat)

flat is built according to the accepted answer here: Making a flat list out of list of lists in Python flat是根据这里接受的答案构建的: 在Python中列出列表中的平面列表

You could do it with itertools or with list slicing stated here . 您可以使用itertools此处所述的列表切片来完成此操作 The result looks pretty smart with itertools. 使用itertools,结果看起来很聪明。

strings_and_Ns = {}
for skey, sval in strings.iteritems():
    iters = [iter(sval), iter(Ns[skey])]
    strings_and_Ns[skey] = ["".join(it.next() for it in itertools.cycle(iters))]

You have to take care about the corresponding length of your lists. 您必须注意列表的相应长度。 If one iterator raise StopIteration the merging ends for that key. 如果一个迭代器引发StopIteration ,则该键的合并结束。

To alternate x , y iterables inserting default for missing values: 要交替xy迭代插入default值的default值:

from itertools import izip_longest

def alternate(x, y, default):
    return (item for pair in izip_longest(x, y, default) for item in pair)

Example

a = {'S1' : ["string0", "string1", "string2"], 'S2' : ["string0", "string1"]}
b = {'S1' : ["N0", "N1"], 'S2' : ["N0"]}
assert a.keys() == b.keys()
merged = {k: ''.join(alternate(a[k], b[k], '')) for k in a}
print(merged)

Output 产量

{'S2': 'string0N0string1', 'S1': 'string0N0string1N1string2'}

itertools.izip_longest will take care of the uneven length lists, then just use str.join to join into one single string. itertools.izip_longest将处理不均匀的长度列表,然后使用str.join连接成一个单独的字符串。

strings = {'S1' : ["string0", "string1", "string2"], 'S2' : ["string0", "string1"]}

Ns = {'S1' : ["N0", "N1"], 'S2' : ["N0"]}

from itertools import izip_longest as iz

strings_and_Ns = {k:["".join([a+b for a, b in iz(strings[k],v,fillvalue="")])] for k,v in Ns.items()}

print(strings_and_Ns)
{'S2': ['string0N0string1'], 'S1': ['string0N0string1N1string2']}

Which is the same as: 这与以下相同:

strings_and_Ns  = {}
for k, v in Ns.items():
     strings_and_Ns[k] = ["".join([a + b for a, b in iz(strings[k], v, fillvalue="")])]

Using izip_longest means the code will work no matter which dict's values contain more elements. 使用izip_longest意味着无论哪个dict的值包含更多元素,代码都将起作用。

Similar to the other solutions posted, but I would move some of it off into a function 与发布的其他解决方案类似,但我会将其中的一部分移到一个函数中

import itertools   

def alternate(*iters, **kwargs):
    return itertools.chain(*itertools.izip_longest(*iters, **kwargs))

result = {k: ''.join(alternate(strings[k], Ns[k] + [''])) for k in Ns}
print result

Gives: 得到:

{'S2': 'string0N0string1', 'S1': 'string0N0string1N1string2'}

The alternate function is from https://stackoverflow.com/a/2017923/66349 . alternate功能来自https://stackoverflow.com/a/2017923/66349 It takes iterables as arguments and chains together items from each one successively (using izip_longest as Padraic Cunningham did). 它将iterables作为参数并连续地将每个项链接在一起(使用izip_longest作为Padraic Cunningham所做的)。

You can either specify fillvalue='' to handle the different length lists, or just manually pad out the shorter list as I have done above (which assumes Ns will always be one shorter than strings ). 您可以指定fillvalue=''来处理不同的长度列表,或者只是手动填充较短的列表,如上所述(假设Ns总是比strings短一个)。

If you have an older python version that doesn't support dict comprehension, you could use this instead 如果你有一个不支持dict理解的旧python版本,你可以使用它

result = dict((k, ''.join(alternate(strings[k], Ns[k] + ['']))) for k in Ns)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM