简体   繁体   English

python:替换/替换字符串中的所有全字匹配

[英]python: Replace/substitute all whole-word match in a string

Let's suppose that my string is "#big and #small, #big-red, #big-red-car and #big" 我们假设我的字符串是"#big and #small, #big-red, #big-red-car and #big"

How can I use re.sub(), re.match(), etc. to replace one tag with a word? 如何使用re.sub(), re.match(), etc.将一个标签替换为单词?

For example, all #big s must be changed to BIG, but #big-red and #big-red-car shouldn't be affected. 例如,所有#big必须更改为BIG,但#big-red#big-red-car不应该受到影响。

Let's define your string: 让我们定义你的字符串:

>>> s = "#big and #small, #big-red, #big-red-car and #big"

Now, let's do your replacements: 现在,让我们做你的替换:

>>> import re
>>> re.sub(r'#big([.,\s]|$)', r'#BIG\1', s)
'#BIG and #small, #big-red, #big-red-car and #BIG'

The regex #big([.,\\s]|$) will match all #big strings that are followed by a period, a comma, a space, or the end-of-the-line. 正则表达式#big([.,\\s]|$)将匹配所有#big字符串,后跟句点,逗号,空格行尾。 If there are other characters that you consider to be acceptable after #big , you should add them to the regex. 如果在#big之后您认为其他字符可以接受,则应将它们添加到正则表达式中。

Alternative 替代

If we want to be a little bit fancier, we can use a look-ahead assertion, (?=...) , to assure that what follows #big is acceptable: 如果我们想要有点发烧友,我们可以使用#big断言(?=...) ,以确保#big是可以接受的:

>>> re.sub(r'#big(?=[.,\s]|$)', r'#BIG', s)
'#BIG and #small, #big-red, #big-red-car and #BIG'

A test using periods and commas 使用句点和逗号的测试

To test that this works as desired when #big has "a comma or period after it" , let's create a new string: 为了测试当#big具有“之后的逗号或句点”时,这可以正常工作,让我们创建一个新字符串:

>>> s = "#big and #big, #big. #small, #big-red, #big-red-car and #big"

And, let's test it: 而且,让我们测试一下:

>>> re.sub(r'#big(?=[.,\s]|$)', r'#BIG', s)
'#BIG and #BIG, #BIG. #small, #big-red, #big-red-car and #BIG'

This info is a category of one-directional boundary tricks. 此信息是一种单向边界技巧。

Using a Negative look behind/ahead assertion(s), 使用负面看后面/前面的断言,
within the particular direction, it will let BEGIN/END of string match, 在特定方向内,它会让BEGIN / END字符串匹配,
yet not allow others to match. 但不允许其他人匹配。

This leads to some interesting scenarios of combining 这导致了一些有趣的组合场景
negative construct's within a class, that cover an endless range 负面构造在一个类中,涵盖了无穷无尽的范围
of characters, yet lets you exclude some individual characters within 字符,但允许您排除其中的一些单个字符
that range. 那个范围。

Typical constructs to use are the negative classes. 要使用的典型构造是否定类。

\\D - Non-Digit class \\D - 非数字课程
\\S - Non-Whitespace class \\S - 非空白类
\\W - Non-Word class \\W - 非Word课程
\\PP - Non-Punctuation property class \\PP - 非标点属性类
\\PL - Non-Letters property class \\PL - 非字母属性类

Since they are used in a negative assertion, the inverse's are actually the 由于它们用于否定断言,因此反转实际上是
characters being sought. 正在寻找的人物。

\\d, \\s, \\w, \\pP, \\pL respectively \\d, \\s, \\w, \\pP, \\pL分别

The power comes from the fact that they can be combined within 权力源于它们可以结合在一起的事实
a class for dramatic effects. 一个戏剧效果的课程

If individual characters are added to a class, they are excluded, not allowed. 如果将单个字符添加到类中,则将其排除,不允许使用。
Effectively, it creates class subtraction . 实际上,它创建了类减法

The rules when creating a class are: 创建类时的规则是:

  • Classes of characters you want, insert it's negative (ie \\D , \\PP , etc..) 你想要的字符 ,插入它是否定的(即\\D\\PP等...)
  • Individual Characters you don't want, insert as normal (ie \\n , = , etc..) 您不想要的单个字符 ,正常插入(即\\n=等等)
    This can be used as class subtraction. 这可以用作类减法。

Subtraction Example: (?![\\S\\r\\n]) would be a lookahead boundary that requires 减法示例: (?![\\S\\r\\n])将是一个需要的前瞻边界
only horizontal whitespace, that in some engines, is represented as 只有水平空格,在某些引擎中,表示为
the \\h construct. \\h构造。


In your example, the boundary's would be something like this. 在你的例子中,边界将是这样的。

(?<![\\S\\PP-])#big(?![\\S\\PP-])

Breaking it down 打破它

 (?<!            # Boundary - Behind direction
      [\S\PP-]   # Need all whitespace and punctuation, but not the '-'
 )
 \#big
 (?!             # Boundary - Ahead direction
      [\S\PP-]   # Need all whitespace and punctuation, but not the '-'
 )

Each literal character that is added to the class, actually excludes 添加到类中的每个文字字符实际上都是排除的
it from matching. 它来自匹配。

This is called class subtraction . 这称为类减法


Test case 测试用例

Input #big and #small, #big, #big, #big-red, #big-red-car and #big 输入#big and #small, #big, #big, #big-red, #big-red-car and #big

Output 产量

 **  Grp 0 -  ( pos 0 , len 4 ) 
#big  

 **  Grp 0 -  ( pos 17 , len 4 ) 
#big  

 **  Grp 0 -  ( pos 23 , len 4 ) 
#big  

 **  Grp 0 -  ( pos 56 , len 4 ) 
#big  

Basically, matches these only #big and #small, #big , #big , #big-red, #big-red-car and #big 基本上,符合这些只是#big和#small, #big#big ,#大红色,#大红色车和#big

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM