简体   繁体   English

如何使用正则表达式替换一定数量的空格?

[英]How do I replace a certain amount of whitespace using regex?

I'm using Calibre to convert a PDF to MOBI, but it has trouble interpreting space-indented code blocks. 我正在使用Calibre将PDF转换为MOBI,但它无法解释空间缩进的代码块。 The blocks contain a lot of spaces, but in a lot of different amounts. 这些块包含很多空格,但是有很多不同的空间。 Some lines are even indented by 31 spaces. 有些线条甚至缩进了31个空格。

Calibre allows for 3 regexes to do search and replace in the book before it's converted. Calibre允许3个正则表达式在转换之前在书中进行搜索和替换。

This is what I've tried. 这就是我尝试过的。

\n( *) ( *)([a-zA-Z{};\*\/\(\)&#0-9])

Replace with: 用。。。来代替:

\n\1 \2\3

The problem, it only replaces one of the spaces. 问题是,它只替换其中一个空格。 I want them all replaced with the same abount of   我希望他们全部用相同的数量取代  .

I've also tried lazy versions of the first group etc. 我也试过了第一组的懒惰版本等。

Is this one of the cases where regular expressions are insufficient? 这是正则表达式不足的情况之一吗? I think this regex engine is the python standard. 我认为这个正则表达式引擎是python标准。

If this were Perl you could replace (\\G|\\n) with $1&nbsp; 如果这是Perl你可以用$1&nbsp;替换(\\G|\\n) $1&nbsp; , and if it were a regex engine that allowed limited-width lookbehinds (instead of fixed-width lookbehinds like Python's) you could replace (?<=\\n {0,30}) with &nbsp; ,如果它是一个正则表达式引擎,允许有限宽度的lookbehinds(而不是Python的固定宽度lookbehinds)你可以用&nbsp;替换(?<=\\n {0,30}) ; ; but as it is, the only way I can think of is to replace something like ((?<=\\n)|(?<=\\n )|(?<=\\n {2})|(?<=\\n {3})|(?<=\\n {4})|(?<=\\n {5})|...|(?<=\\n {30})) with &nbsp; 但事实上,我能想到的唯一方法是替换像((?<=\\n)|(?<=\\n )|(?<=\\n {2})|(?<=\\n {3})|(?<=\\n {4})|(?<=\\n {5})|...|(?<=\\n {30})) &nbsp; . . . and I suspect that at that point you'll reach a limit on how long Calibre allows the input regex to be. 而且我怀疑在那时你将达到Calibre允许输入正则表达式多长时间的限制。 :-/ : - /

Another option is to take a completely different approach, and replace 另一种选择是采用完全不同的方法,并进行替换    (two spaces) with &nbsp; (两个空格)与&nbsp; (non-breaking-space + regular space), without bothering to restrict it to the beginning of a line. (非破坏空间+常规空间),无需将其限制在一行的开头。 I'm guessing that that will satisfy your needs? 我猜这会满足你的需求吗?

\\ s {31}恰好匹配31个空格,\\ s {14,31} 14到31

Any reason not to just replace ALL spaces by non-breaking spaces? 有什么理由不用非破坏空格替换所有空格? ( r/ /&nbsp;/ .) r/ /&nbsp;/ 。)

It won't change the appearance of normal English text (except where the source had extraeneous double-spaces) and your code blocks will render correctly. 它不会改变普通英文文本的外观(除非源具有额外的双空格),并且您的代码块将正确呈现。


For fun, my attempt in Python: 为了好玩,我在Python中的尝试:

>>> eight_spaces = "        hello world!"
>>> re.sub(r"^(|(?:&nbsp;)*)\s",r"\1&nbsp;",eight_spaces)
'&nbsp;      hello world!'

The idea is to replace one space at a time. 这个想法是一次更换一个空间。 It doesn't work because the re engine doesn't go back to the start of the line after a match - it consumes the string working left to right. 它不起作用,因为re引擎在匹配后不会返回到行的开头 - 它消耗从左到右工作的字符串。

Note the alternation of (?:&nbsp;)* with the empty string, (|(?:&nbsp;)*) , so that the capture group \\1 always captures something (even the empty string.) 注意(?:&nbsp;)*与空字符串(|(?:&nbsp;)*)的交替,以便捕获组\\1始终捕获某些内容 (甚至是空字符串。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM