使用正则表达式来约束元组列表

Question

Given a list of tuples of words and their part-of-speech from a sentence:给定一个单词元组列表及其句子中的词性：

[('We', 'PRP'),
 ('took', 'VBD'),
 ('advantage', 'NN'),
 ('of', 'IN'),
 ('the', 'DT'),
 ('half', 'JJ'),
 ('price', 'NN'),
 ('sushi', 'NN'),
 ('deal', 'NN'),
 ('on', 'IN'),
 ('saturday', 'NN')]

I would like to extract terms that have certain PoS sequences using a regexp.我想使用正则表达式提取具有某些 PoS 序列的术语。 This would be something like ('JJ')*('NN')+ so I have a list of [('advantage', 'half price sushi deal', 'saturday')] .这将类似于('JJ')*('NN')+所以我有一个[('advantage', 'half price sushi deal', 'saturday')] 。 What is the most appropriate way of carrying out such a task, bearing in mind I will be doing this for hundreds of sentences?记住我将用数百个句子来执行此任务，执行此类任务的最合适方法是什么？

Thank you!谢谢！

Answer 1

I think this might be something that will do the trick:我认为这可能会奏效：

a = [('We', 'PRP'),
 ('took', 'VBD'),
 ('advantage', 'NN'),
 ('of', 'IN'),
 ('the', 'DT'),
 ('half', 'JJ'),
 ('price', 'NN'),
 ('sushi', 'NN'),
 ('deal', 'NN'),
 ('on', 'IN'),
 ('saturday', 'NN')]

b = iter(a[1:])

my_list = []
inner_list = []
accepted = ['JJ', 'NN']

for item in a:
    word = item[0]
    check = item[1]
    try:
        against = next(b)
        if check in accepted:
            if against[1] not in accepted:
                inner_list.append(word)
                my_list.append(inner_list)
                inner_list = []
            else:
                inner_list.append(word)
    except StopIteration:
        if check in accepted:
             inner_list.append(word)
             my_list.append(inner_list)
final = [' '.join(item) for item in my_list]

使用正则表达式来约束元组列表

问题描述

1 个解决方案

解决方案1
1 2017-04-13 14:56:54

使用正则表达式来约束元组列表

问题描述

1 个解决方案

解决方案1 1 2017-04-13 14:56:54

解决方案1
1 2017-04-13 14:56:54