简体   繁体   English

在Python中使用RegEx查找以h开头但不包含html标记的单词

[英]Using RegEx in Python finding words beginning with h, but not including html tags

I'm trying to find all words beginning with h, but I need to exclude html tags, like within this search. 我正在尝试查找所有以h开头的单词,但是我需要排除html标签,例如在此搜索中。 I have the code to find all the words starting with h: 我有找到所有以h开头的单词的代码:

\h\w+

I just don't know how to exclude things within my search specifically an html tag. 我只是不知道如何在搜索中排除某些东西,特别是html标签。

Use de exclude character [^] 使用排除字符[^]

[^<]h\w+ 

But i think this way may work better for what you want, since it generates a match for every word beginning with h that's not a 但我认为这种方式可能会更好地满足您的需求,因为它会为以h开头的每个单词生成一个匹配项,而不是a

 (?!<)h\w+

Even better, do the following match: 更好的是,进行以下匹配:

 ((?!<)h\w+)

(close attention, there is a blank space just before the first ( ) (请注意,第一个( )之前有一个空格。

If the text is: 如果文本是:

html teste homem carro agharro hzete h html teste homem carro agharro hzete h

It will do a full match with " homem" and " hzete", being the first match groups the word you want. 它将与“ homem”和“ hzete”进行完全匹配,这是您想要的单词的第一个匹配组。 "homem","hzete". “ homem”,“ hzete”。

I would recomend you a graphical regex validation tool, so you see live the expressions you are writing. 我向您推荐一个图形化的正则表达式验证工具,以便您实时查看所编写的表达式。 A good one is https://regex101.com/ 一个不错的是https://regex101.com/

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM