簡體   English   中英

正則表達式子字符串匹配在python中的列表中的錯誤

[英]Error in regex substring match in a list in python

我有一個清單清單如下。

mylist = [["the", "and" "fresh milk", "a loaf of bread", "the butter"], ["an apple", "eggs", "oranges", "cup of tea"]]

現在,我想刪除mylist中的停用詞,以便我的新列表如下。

mylist = [["fresh milk", "loaf bread", "butter"], ["apple", "eggs", "oranges", "cup tea"]]

我當前的代碼如下。

cleaned_mylist= []
stops = ['a', 'an', 'of', 'the']
pattern = re.compile(r'|'.join([r'(\s|\b){}\b'.format(x) for x in stops]))
for item in mylist:
    inner_list= []
    for words in item:
       inner_list.append(pattern.sub('', item).strip())
    cleaned_mylist.append(inner_list)

但是,該代碼似乎無法正常工作。 請幫我。

在此示例中,您無需使用正則表達式。

mylist = [["the", "and", "fresh milk", "a loaf of bread", "the butter"], ["an apple", "eggs", "oranges", "cup of tea"]]
expected = [["fresh milk", "loaf bread", "butter"], ["apple", "eggs", "oranges", "cup tea"]]

cleaned_mylist= []
stops = ['a', 'an', 'of', 'the', 'and']
for item in mylist:
    inner_list= []
    for sentence in item:
        out_sentence = []
        for word in sentence.split():
            if word not in stops:
                out_sentence.append(word)
        if len(out_sentence) > 0:
            inner_list += [' '.join(out_sentence)]
    cleaned_mylist.append(inner_list)

print expected == cleaned_mylist
# True

您的模式與子列表(項目)匹配,而不與單詞匹配

mylist = [["the", "and","fresh milk", "a loaf of bread", "the butter"], ["an apple", "eggs", "oranges", "cup of tea"]]
cleaned_mylist= []
stops = ['a', 'an', 'of', 'the','and']
pattern = re.compile(r'|'.join([r'(\s|\b){}\b'.format(x) for x in stops]))
for item in mylist:
    inner_list= []
    for words in item:
        if pattern.sub('', words).strip() != '':
            inner_list.append(pattern.sub('', words).strip())
    cleaned_mylist.append(inner_list)

if not使用

import re
mylist = [["the", "and", "fresh milk", "a loaf of bread", "the butter"], ["an apple", "eggs", "oranges", "cup of tea"]]
cleaned_mylist= []
stops = ['a', 'an', 'of', 'the','and']
pattern = '|'.join([r'\b{}\b\s?'.format(x) for x in stops])
for item in mylist:
    inner_list= []
    for words in item:
        words = re.sub(pattern,'',words)
        if(words !=  ""):
            inner_list.append(words)
    cleaned_mylist.append(inner_list)

print cleaned_mylist

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM