[英]Using regex to constrain list of tuples
Given a list of tuples of words and their part-of-speech from a sentence:给定一个单词元组列表及其句子中的词性:
[('We', 'PRP'),
('took', 'VBD'),
('advantage', 'NN'),
('of', 'IN'),
('the', 'DT'),
('half', 'JJ'),
('price', 'NN'),
('sushi', 'NN'),
('deal', 'NN'),
('on', 'IN'),
('saturday', 'NN')]
I would like to extract terms that have certain PoS sequences using a regexp.我想使用正则表达式提取具有某些 PoS 序列的术语。 This would be something like ('JJ')*('NN')+
so I have a list of [('advantage', 'half price sushi deal', 'saturday')]
.这将类似于('JJ')*('NN')+
所以我有一个[('advantage', 'half price sushi deal', 'saturday')]
。 What is the most appropriate way of carrying out such a task, bearing in mind I will be doing this for hundreds of sentences?记住我将用数百个句子来执行此任务,执行此类任务的最合适方法是什么?
Thank you!谢谢!
I think this might be something that will do the trick:我认为这可能会奏效:
a = [('We', 'PRP'),
('took', 'VBD'),
('advantage', 'NN'),
('of', 'IN'),
('the', 'DT'),
('half', 'JJ'),
('price', 'NN'),
('sushi', 'NN'),
('deal', 'NN'),
('on', 'IN'),
('saturday', 'NN')]
b = iter(a[1:])
my_list = []
inner_list = []
accepted = ['JJ', 'NN']
for item in a:
word = item[0]
check = item[1]
try:
against = next(b)
if check in accepted:
if against[1] not in accepted:
inner_list.append(word)
my_list.append(inner_list)
inner_list = []
else:
inner_list.append(word)
except StopIteration:
if check in accepted:
inner_list.append(word)
my_list.append(inner_list)
final = [' '.join(item) for item in my_list]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.