[英]Error in regex substring match in a list in python
我有一个清单清单如下。
mylist = [["the", "and" "fresh milk", "a loaf of bread", "the butter"], ["an apple", "eggs", "oranges", "cup of tea"]]
现在,我想删除mylist
中的停用词,以便我的新列表如下。
mylist = [["fresh milk", "loaf bread", "butter"], ["apple", "eggs", "oranges", "cup tea"]]
我当前的代码如下。
cleaned_mylist= []
stops = ['a', 'an', 'of', 'the']
pattern = re.compile(r'|'.join([r'(\s|\b){}\b'.format(x) for x in stops]))
for item in mylist:
inner_list= []
for words in item:
inner_list.append(pattern.sub('', item).strip())
cleaned_mylist.append(inner_list)
但是,该代码似乎无法正常工作。 请帮我。
在此示例中,您无需使用正则表达式。
mylist = [["the", "and", "fresh milk", "a loaf of bread", "the butter"], ["an apple", "eggs", "oranges", "cup of tea"]]
expected = [["fresh milk", "loaf bread", "butter"], ["apple", "eggs", "oranges", "cup tea"]]
cleaned_mylist= []
stops = ['a', 'an', 'of', 'the', 'and']
for item in mylist:
inner_list= []
for sentence in item:
out_sentence = []
for word in sentence.split():
if word not in stops:
out_sentence.append(word)
if len(out_sentence) > 0:
inner_list += [' '.join(out_sentence)]
cleaned_mylist.append(inner_list)
print expected == cleaned_mylist
# True
您的模式与子列表(项目)匹配,而不与单词匹配
mylist = [["the", "and","fresh milk", "a loaf of bread", "the butter"], ["an apple", "eggs", "oranges", "cup of tea"]]
cleaned_mylist= []
stops = ['a', 'an', 'of', 'the','and']
pattern = re.compile(r'|'.join([r'(\s|\b){}\b'.format(x) for x in stops]))
for item in mylist:
inner_list= []
for words in item:
if pattern.sub('', words).strip() != '':
inner_list.append(pattern.sub('', words).strip())
cleaned_mylist.append(inner_list)
if not
使用
import re
mylist = [["the", "and", "fresh milk", "a loaf of bread", "the butter"], ["an apple", "eggs", "oranges", "cup of tea"]]
cleaned_mylist= []
stops = ['a', 'an', 'of', 'the','and']
pattern = '|'.join([r'\b{}\b\s?'.format(x) for x in stops])
for item in mylist:
inner_list= []
for words in item:
words = re.sub(pattern,'',words)
if(words != ""):
inner_list.append(words)
cleaned_mylist.append(inner_list)
print cleaned_mylist
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.