在python中使用正则表达式剥离标签

Question

How can I go about stripping the tags off this list:我怎样才能从这个列表中剥离标签：

['</span>A walk in the park<span class="html-tag"]

I managed to use (r'(?<=</span>)[^>]+') to remove the first tag but cant figure out how to remove the second.我设法使用(r'(?<=</span>)[^>]+')删除第一个标签，但不知道如何删除第二个。 I know regular expressions ain't the way to go for dealing with tags but just want to figure this out.我知道正则表达式不是处理标签的方法，但只是想弄清楚这一点。

Answer 1

You can use:您可以使用：

(?:>)(.*)(?:<)

In regex, every opened and closed round brakets defines a group.在正则表达式中，每个打开和关闭的圆形刹车都定义了一个组。 Here, we have 3 couples of rounded brackets but the first and the last one have a ?: inside.在这里，我们有 3 对圆括号，但第一个和最后一个有一个?: 。 That means that the group being defined is a non-capturing group so it is needed to match the pattern but it will not be returned by the parser.这意味着被定义的组是一个非捕获组，因此需要匹配模式，但解析器不会返回它。 Instead, what you want is in group #1.相反，您想要的是第 1 组。

Answer 2

You were quite close with your regex.您与您的正则表达式非常接近。 After the position found by the lookbehind, you just want to read up to the next < :在lookbehind找到的位置之后，您只想阅读下一个< ：

(?<=</span>)[^<]+

Check it out on regex101在regex101上查看

$ cat test.py
import re
s='</span>A walk in the park<span class="html-tag"'
print re.findall(r'(?<=</span>)[^<]+', s)

$ python test.py
['A walk in the park']

在python中使用正则表达式剥离标签

问题描述

2 个解决方案

解决方案1
0 2017-10-15 14:36:17

解决方案2
0 已采纳 2017-10-15 14:36:30

在python中使用正则表达式剥离标签

问题描述

2 个解决方案

解决方案1 0 2017-10-15 14:36:17

解决方案2 0 已采纳 2017-10-15 14:36:30

解决方案1
0 2017-10-15 14:36:17

解决方案2
0 已采纳 2017-10-15 14:36:30