如何使用正则表达式替换一定数量的空格？

Question

I'm using Calibre to convert a PDF to MOBI, but it has trouble interpreting space-indented code blocks. 我正在使用Calibre将PDF转换为MOBI，但它无法解释空间缩进的代码块。 The blocks contain a lot of spaces, but in a lot of different amounts. 这些块包含很多空格，但是有很多不同的空间。 Some lines are even indented by 31 spaces. 有些线条甚至缩进了31个空格。

Calibre allows for 3 regexes to do search and replace in the book before it's converted. Calibre允许3个正则表达式在转换之前在书中进行搜索和替换。

This is what I've tried. 这就是我尝试过的。

\n( *) ( *)([a-zA-Z{};\*\/\(\)&#0-9])

Replace with: 用。。。来代替：

\n\1&nbsp;\2\3

The problem, it only replaces one of the spaces. 问题是，它只替换其中一个空格。 I want them all replaced with the same abount of   我希望他们全部用相同的数量取代  . 。

I've also tried lazy versions of the first group etc. 我也试过了第一组的懒惰版本等。

Is this one of the cases where regular expressions are insufficient? 这是正则表达式不足的情况之一吗？ I think this regex engine is the python standard. 我认为这个正则表达式引擎是python标准。

Answer 1

If this were Perl you could replace (\\G|\\n) with $1  如果这是Perl你可以用$1 替换(\\G|\\n) $1  , and if it were a regex engine that allowed limited-width lookbehinds (instead of fixed-width lookbehinds like Python's) you could replace (?<=\\n {0,30}) with   ，如果它是一个正则表达式引擎，允许有限宽度的lookbehinds（而不是Python的固定宽度lookbehinds）你可以用 替换(?<=\\n {0,30}) ; ; but as it is, the only way I can think of is to replace something like ((?<=\\n)|(?<=\\n )|(?<=\\n {2})|(?<=\\n {3})|(?<=\\n {4})|(?<=\\n {5})|...|(?<=\\n {30})) with   但事实上，我能想到的唯一方法是替换像((?<=\\n)|(?<=\\n )|(?<=\\n {2})|(?<=\\n {3})|(?<=\\n {4})|(?<=\\n {5})|...|(?<=\\n {30}))   . 。 . 。 . 。 and I suspect that at that point you'll reach a limit on how long Calibre allows the input regex to be. 而且我怀疑在那时你将达到Calibre允许输入正则表达式多长时间的限制。 :-/ ： - /

Another option is to take a completely different approach, and replace 另一种选择是采用完全不同的方法，并进行替换 (two spaces) with   （两个空格）与  (non-breaking-space + regular space), without bothering to restrict it to the beginning of a line. （非破坏空间+常规空间），无需将其限制在一行的开头。 I'm guessing that that will satisfy your needs? 我猜这会满足你的需求吗？

Answer 2

\\ s {31}恰好匹配31个空格，\\ s {14,31} 14到31

Answer 3

Any reason not to just replace ALL spaces by non-breaking spaces? 有什么理由不用非破坏空格替换所有空格？ ( r/ / / .) （ r/ / / 。）

It won't change the appearance of normal English text (except where the source had extraeneous double-spaces) and your code blocks will render correctly. 它不会改变普通英文文本的外观（除非源具有额外的双空格），并且您的代码块将正确呈现。

For fun, my attempt in Python: 为了好玩，我在Python中的尝试：

>>> eight_spaces = "        hello world!"
>>> re.sub(r"^(|(?:&nbsp;)*)\s",r"\1&nbsp;",eight_spaces)
'&nbsp;      hello world!'

The idea is to replace one space at a time. 这个想法是一次更换一个空间。 It doesn't work because the re engine doesn't go back to the start of the line after a match - it consumes the string working left to right. 它不起作用，因为re引擎在匹配后不会返回到行的开头 - 它消耗从左到右工作的字符串。

Note the alternation of (?: )* with the empty string, (|(?: )*) , so that the capture group \\1 always captures something (even the empty string.) 注意(?: )*与空字符串(|(?: )*)的交替，以便捕获组\\1始终捕获某些内容 （甚至是空字符串。）

如何使用正则表达式替换一定数量的空格？

问题描述

3 个解决方案

解决方案1
2 已采纳 2012-03-01 02:39:50

解决方案2
1

解决方案3
1 2012-03-01 02:53:25

如何使用正则表达式替换一定数量的空格？

问题描述

3 个解决方案

解决方案1 2 已采纳 2012-03-01 02:39:50

解决方案2 1

解决方案3 1 2012-03-01 02:53:25

解决方案1
2 已采纳 2012-03-01 02:39:50

解决方案2
1

解决方案3
1 2012-03-01 02:53:25