在Python中将相邻列表元素与元组列表匹配

Question

I have an ordered list of individual words from a document, like so: 我有一个文档中各个单词的有序列表，如下所示：

words = ['apple', 'orange', 'boat', 'car', 'happy', 'day', 'cow', ...]

I have a second list of tuples of significant bigrams/collocations, like so: 我还有重要的二元组/搭配的元组列表，如下所示：

bigrams = [('apple', 'orange'), ('happy', 'day'), ('big', 'house'), ...]

I would like to iterate through the list of individual words and replace adjacent words with an underscore-separated bigram, ending up with a list like this: 我想遍历单个单词的列表，并用下划线分隔的双字母组替换相邻的单词，最后得到一个像这样的列表：

words_fixed = ['apple_orange', 'boat', 'car', 'happy_day', 'cow', ...]

I've considered flattening words and bigrams into strings ( " ".join(words) , etc.) and then using regex to find and replace the adjacent words, but that seems horribly inefficient and unpythonic. 我曾考虑过将words和bigrams words成字符串（ " ".join(words)等），然后使用正则表达式来查找和替换相邻的单词，但这似乎效率极低且令人难以置信。

What's the best way to quickly match and combine adjacent list elements from a list of tuples? 快速匹配和组合元组列表中的相邻列表元素的最佳方法是什么？

Answer 1

Not as flashy as @inspectorG4dget: 不像@ inspectorG4dget那样浮华：

words_fixed = []
last = None
for word in words:
    if (last,word) in bigrams:
        words_fixed.append( "%s_%s" % (last,word) )
        last = None
    else:
        if last:
            words_fixed.append( last )
        last = word
if last:
    words_fixed.append( last )

Answer 2

words = ['apple', 'orange', 'boat', 'car', 'happy', 'day', 'cow', ...]
bigrams = [('apple', 'orange'), ('happy', 'day'), ('big', 'house'), ...]

First, some optimization: 首先，进行一些优化：

import collections
bigrams = collections.defaultdict(set)
for w1,w2 in bigrams:
    bigrams[w1].add(w2)

Now, onto the fun stuff: 现在，到有趣的东西上：

import itertools
words_fixed = []
for w1,w2 in itertools.izip(itertools.islice(words, 0, len(words)), (itertools.islice(words, 1, len(words)))):
    if w1 in bigrams and w2 in bigrams[w1]:
        words_fixed.append("%s_%s" %(w1, w2))

If you want to see words that are not in your bigrams, in addition to the words you've recorded in your bigrams, then this should do the trick: 如果您想查看不在二元组中的单词，除了您在二元组中记录的单词之外，还可以这样做：

import itertools
words_fixed = []
for w1,w2 in itertools.izip(itertools.islice(words, 0, len(words)), (itertools.islice(words, 1, len(words)))):
    if w1 in bigrams and w2 in bigrams[w1]:
        words_fixed.append("%s_%s" %(w1, w2))
    else:
        words_fixed.append(w1)

Answer 3

words = ['apple', 'orange', 'boat', 'car', 'happy', 'day', 'cow']
bigrams = [('apple', 'orange'), ('happy', 'day'), ('big', 'house')]

bigrams_dict = dict(item for item in bigrams)
bigrams_dict.update(item[::-1] for item in bigrams)

words_fixed = ["{}_{}".format(word, bigrams_dict[word]) 
    if word in bigrams_dict else word
    for word in words]

[edit] another way to create dictionary: [编辑]创建字典的另一种方法：

from itertools import chain
bigrams_rev = (reversed(x) for x in bigrams)
bigrams_dict = dict(chain(bigrams, bigrams_rev))

Answer 4

words = ['apple', 'orange', 'boat', 'car', 'happy', 'day', 'cow', 'big']
bigrams = [('apple', 'orange'), ('happy', 'day'), ('big', 'house')]
print 'words   :',words
print 'bigrams :',bigrams
print
def zwii(words,bigrams):
    it = iter(words)
    dict_bigrams = dict(bigrams)
    for x in it:
        if x in dict_bigrams:
            try:
                y = it.next()
                if dict_bigrams[x] == y:
                    yield '-'.join((x,y))
                else:
                    yield x
                    yield y
            except:
                yield x
        else:
            yield x

print list(zwii(words,bigrams))

result 结果

words   : ['apple', 'orange', 'boat', 'car', 'happy', 'day', 'cow', 'big']
bigrams : [('apple', 'orange'), ('happy', 'day'), ('big', 'house')]

['apple-orange', 'boat', 'car', 'happy-day', 'cow', 'big']

在Python中将相邻列表元素与元组列表匹配

问题描述

4 个解决方案

解决方案1
2 已采纳 2014-03-13 00:30:22

解决方案2
1 2014-03-13 00:25:40

解决方案3
1 2014-03-13 00:31:38

解决方案4
1 2014-03-13 01:17:28

在Python中将相邻列表元素与元组列表匹配

问题描述

4 个解决方案

解决方案1 2 已采纳 2014-03-13 00:30:22

解决方案2 1 2014-03-13 00:25:40

解决方案3 1 2014-03-13 00:31:38

解决方案4 1 2014-03-13 01:17:28

解决方案1
2 已采纳 2014-03-13 00:30:22

解决方案2
1 2014-03-13 00:25:40

解决方案3
1 2014-03-13 00:31:38

解决方案4
1 2014-03-13 01:17:28