在正則表達式匹配中修改組

Question

因此，除了Django（v 1.5）模型之外，我還有一個功能，該功能采用文本主體並查找我的所有標簽，例如將正確的標簽轉換為用戶的標簽，然后刪除所有其他標簽。

以下功能當前有效，但要求我使用note_tags ='。*？\\ r \\ n'，因為標簽組0會找到所有標簽，而不管用戶的昵稱是否在其中。 我很好奇我將如何使用這些組，以便可以刪除所有無用的標簽而不必修改RegEx。

def format_for_user(self, user):
    body = self.body
    note_tags = '<note .*?>.*?</note>\r\n'
    user_msg = False
    if not user is None:
        user_tags = '(<note %s>).*?</note>' % user.nickname
        user_tags = re.compile(user_tags)
        for tag in user_tags.finditer(body):
            if tag.groups(1):
                replacement = str(tag.groups(1)[0])
                body = body.replace(replacement, '<span>')
                replacement = str(tag.group(0)[-7:])
                body = body.replace(replacement, '</span>')
                user_msg = True
                note_tags = '<note .*?>.*?</span>\r\n'
    note_tags = re.compile(note_tags)
    for tag in note_tags.finditer(body):
        body = body.replace(tag.group(0), '')
    return (body, user_msg)

Answer 1

所以abarnert是正確的，所以我不應該使用Regex來解析我的HTML，而是應該使用BeautifulSoup的語言。

因此，我使用了BeautifulSoup，這是生成的代碼，解決了Regex遇到的許多問題。

def format_for_user(self, user):
    body = self.body
    soup = BeautifulSoup(body)
    user_msg = False
    if not user is None:
        user_tags = soup.findAll('note', {"class": "%s" % user.nickname})
        for tag in user_tags:
            tag.name = 'span'
    all_tags = soup.findAll('note')
    for tag in all_tags:
        tag.decompose()
    soup = soup.prettify()
    return (soup, user_msg)

在正則表達式匹配中修改組

問題描述

1 個解決方案

解決方案1
0 已采納 2014-09-29 03:32:07

在正則表達式匹配中修改組

問題描述

1 個解決方案

解決方案1 0 已采納 2014-09-29 03:32:07

解決方案1
0 已采納 2014-09-29 03:32:07