简体   繁体   English

使用正则表达式来约束元组列表

[英]Using regex to constrain list of tuples

Given a list of tuples of words and their part-of-speech from a sentence:给定一个单词元组列表及其句子中的词性:

[('We', 'PRP'),
 ('took', 'VBD'),
 ('advantage', 'NN'),
 ('of', 'IN'),
 ('the', 'DT'),
 ('half', 'JJ'),
 ('price', 'NN'),
 ('sushi', 'NN'),
 ('deal', 'NN'),
 ('on', 'IN'),
 ('saturday', 'NN')]

I would like to extract terms that have certain PoS sequences using a regexp.我想使用正则表达式提取具有某些 PoS 序列的术语。 This would be something like ('JJ')*('NN')+ so I have a list of [('advantage', 'half price sushi deal', 'saturday')] .这将类似于('JJ')*('NN')+所以我有一个[('advantage', 'half price sushi deal', 'saturday')] What is the most appropriate way of carrying out such a task, bearing in mind I will be doing this for hundreds of sentences?记住我将用数百个句子来执行此任务,执行此类任务的最合适方法是什么?

Thank you!谢谢!

I think this might be something that will do the trick:我认为这可能会奏效:

a = [('We', 'PRP'),
 ('took', 'VBD'),
 ('advantage', 'NN'),
 ('of', 'IN'),
 ('the', 'DT'),
 ('half', 'JJ'),
 ('price', 'NN'),
 ('sushi', 'NN'),
 ('deal', 'NN'),
 ('on', 'IN'),
 ('saturday', 'NN')]

b = iter(a[1:])

my_list = []
inner_list = []
accepted = ['JJ', 'NN']

for item in a:
    word = item[0]
    check = item[1]
    try:
        against = next(b)
        if check in accepted:
            if against[1] not in accepted:
                inner_list.append(word)
                my_list.append(inner_list)
                inner_list = []
            else:
                inner_list.append(word)
    except StopIteration:
        if check in accepted:
             inner_list.append(word)
             my_list.append(inner_list)
final = [' '.join(item) for item in my_list]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM