Python正则表达式循环跳过每三个项目

Question

I'm doing a tokenizer and I want to separate strings like "word-bound-with-hyphen" into "word xxsep bound xxsep with xxsep hyphen". 我正在做一个标记化器，我想把像“word-bound-with-hyphen”这样的字符串分成“xxsep绑定xxsep和xxsep连字符”。

I tried this: 我试过这个：

import re

s = "words-bound-with-hyphen"
reg_m = re.compile("[\w\d]+-[\w\d]+")
reg = re.compile("([\w\d]+)-([\w\d]+)")
while(reg_m.match(s)):
    s = reg.sub(r"\1 xxsep \2", s)
print(s) #prints "words xxsep bound-with xxsep hyphen"

But this leaves every third hyphen-bound word. 但这留下了每个连字符的第三个字。

Answer 1

You could just replace the hyphens with a regex: 你可以用正则表达式替换连字符：

In [4]: re.sub("-", " xxsep ", "word-bound-with-hyphen")
Out[4]: 'word xxsep bound xxsep with xxsep hyphen'

or with string substitution: 或者用字符串替换：

In [7]: "word-bound-with-hyphen".replace("-", " xxsep ")
Out[7]: 'word xxsep bound xxsep with xxsep hyphen'

The reason your current approach doesn't work is that re.sub() returns non-overlapping groups whereas word-bound overlaps with bound-with overlaps with with-hyphen . 您当前方法不起作用的原因是re.sub() 返回非重叠组，而word-bound重叠与bound-with重叠与with-hyphen 。

Answer 2

If you don't want to just replace all hyphens but only those that are preceded and followed by certain characters than use regex lookbacks and lookaheads. 如果您不想仅替换所有连字符，而只想替换某些字符之前和之后的连字符，而不是使用正则表达式回溯和前瞻。

import re
s = "words-bound-with-hyphen"
re.sub('(?<=[\w\d])-(?=[\w\d])',' xxsep ', s)
# result: 'words xxsep bound xxsep with xxsep hyphen'

Answer 3

import re
s = "words-bound-with-hyphen"
re.sub('-',' xxsep ',s)

or without using regular expressions 或不使用正则表达式

" xxsep ".join(x.split('-'))

here, the list will be separated taking - as delimiter and then joined using "xxsep" 在这里，列表将分隔 - 作为分隔符，然后使用“xxsep”加入

Answer 4

Why not use word boundaries . 为什么不使用单词边界。 Search for \\b-\\b and replace with xxsep . 搜索\\b-\\b并替换为xxsep 。

Python正则表达式循环跳过每三个项目

问题描述

4 个解决方案

解决方案1
2 2019-07-22 08:00:01

解决方案2
1 2019-07-22 08:07:09

解决方案3
1 2019-07-22 08:49:43

解决方案4
0 2019-07-22 08:55:47

Python正则表达式循环跳过每三个项目

问题描述

4 个解决方案

解决方案1 2 2019-07-22 08:00:01

解决方案2 1 2019-07-22 08:07:09

解决方案3 1 2019-07-22 08:49:43

解决方案4 0 2019-07-22 08:55:47

解决方案1
2 2019-07-22 08:00:01

解决方案2
1 2019-07-22 08:07:09

解决方案3
1 2019-07-22 08:49:43

解决方案4
0 2019-07-22 08:55:47