[英]Using RegEx in Python finding words beginning with h, but not including html tags
I'm trying to find all words beginning with h, but I need to exclude html tags, like within this search. 我正在尝试查找所有以h开头的单词,但是我需要排除html标签,例如在此搜索中。 I have the code to find all the words starting with h:
我有找到所有以h开头的单词的代码:
\h\w+
I just don't know how to exclude things within my search specifically an html tag. 我只是不知道如何在搜索中排除某些东西,特别是html标签。
Use de exclude character [^]
使用排除字符
[^]
[^<]h\w+
But i think this way may work better for what you want, since it generates a match for every word beginning with h that's not a 但我认为这种方式可能会更好地满足您的需求,因为它会为以h开头的每个单词生成一个匹配项,而不是a
(?!<)h\w+
Even better, do the following match: 更好的是,进行以下匹配:
((?!<)h\w+)
(close attention, there is a blank space just before the first (
) (请注意,第一个
(
)之前有一个空格。
If the text is: 如果文本是:
html teste homem carro agharro hzete h
html teste homem carro agharro hzete h
It will do a full match with " homem" and " hzete", being the first match groups the word you want. 它将与“ homem”和“ hzete”进行完全匹配,这是您想要的单词的第一个匹配组。 "homem","hzete".
“ homem”,“ hzete”。
I would recomend you a graphical regex validation tool, so you see live the expressions you are writing. 我向您推荐一个图形化的正则表达式验证工具,以便您实时查看所编写的表达式。 A good one is https://regex101.com/
一个不错的是https://regex101.com/
Hope this helps. 希望这可以帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.