[英]How do I replace a certain amount of whitespace using regex?
I'm using Calibre to convert a PDF to MOBI, but it has trouble interpreting space-indented code blocks. 我正在使用Calibre将PDF转换为MOBI,但它无法解释空间缩进的代码块。 The blocks contain a lot of spaces, but in a lot of different amounts.
这些块包含很多空格,但是有很多不同的空间。 Some lines are even indented by 31 spaces.
有些线条甚至缩进了31个空格。
Calibre allows for 3 regexes to do search and replace in the book before it's converted. Calibre允许3个正则表达式在转换之前在书中进行搜索和替换。
This is what I've tried. 这就是我尝试过的。
\n( *) ( *)([a-zA-Z{};\*\/\(\)�-9])
Replace with: 用。。。来代替:
\n\1 \2\3
The problem, it only replaces one of the spaces. 问题是,它只替换其中一个空格。 I want them all replaced with the same abount of
我希望他们全部用相同的数量取代
. 。
I've also tried lazy versions of the first group etc. 我也试过了第一组的懒惰版本等。
Is this one of the cases where regular expressions are insufficient? 这是正则表达式不足的情况之一吗? I think this regex engine is the python standard.
我认为这个正则表达式引擎是python标准。
If this were Perl you could replace (\\G|\\n)
with $1
如果这是Perl你可以用
$1
替换(\\G|\\n)
$1
, and if it were a regex engine that allowed limited-width lookbehinds (instead of fixed-width lookbehinds like Python's) you could replace (?<=\\n {0,30})
with
,如果它是一个正则表达式引擎,允许有限宽度的lookbehinds(而不是Python的固定宽度lookbehinds)你可以用
替换(?<=\\n {0,30})
; ; but as it is, the only way I can think of is to replace something like
((?<=\\n)|(?<=\\n )|(?<=\\n {2})|(?<=\\n {3})|(?<=\\n {4})|(?<=\\n {5})|...|(?<=\\n {30}))
with
但事实上,我能想到的唯一方法是替换像
((?<=\\n)|(?<=\\n )|(?<=\\n {2})|(?<=\\n {3})|(?<=\\n {4})|(?<=\\n {5})|...|(?<=\\n {30}))
. 。 .
。 .
。 and I suspect that at that point you'll reach a limit on how long Calibre allows the input regex to be.
而且我怀疑在那时你将达到Calibre允许输入正则表达式多长时间的限制。 :-/
: - /
Another option is to take a completely different approach, and replace 另一种选择是采用完全不同的方法,并进行替换
(two spaces) with
(两个空格)与
(non-breaking-space + regular space), without bothering to restrict it to the beginning of a line. (非破坏空间+常规空间),无需将其限制在一行的开头。 I'm guessing that that will satisfy your needs?
我猜这会满足你的需求吗?
\\ s {31}恰好匹配31个空格,\\ s {14,31} 14到31
Any reason not to just replace ALL spaces by non-breaking spaces? 有什么理由不用非破坏空格替换所有空格? (
r/ / /
.) (
r/ / /
。)
It won't change the appearance of normal English text (except where the source had extraeneous double-spaces) and your code blocks will render correctly. 它不会改变普通英文文本的外观(除非源具有额外的双空格),并且您的代码块将正确呈现。
For fun, my attempt in Python: 为了好玩,我在Python中的尝试:
>>> eight_spaces = " hello world!"
>>> re.sub(r"^(|(?: )*)\s",r"\1 ",eight_spaces)
' hello world!'
The idea is to replace one space at a time. 这个想法是一次更换一个空间。 It doesn't work because the
re
engine doesn't go back to the start of the line after a match - it consumes the string working left to right. 它不起作用,因为
re
引擎在匹配后不会返回到行的开头 - 它消耗从左到右工作的字符串。
Note the alternation of (?: )*
with the empty string, (|(?: )*)
, so that the capture group \\1
always captures something (even the empty string.) 注意
(?: )*
与空字符串(|(?: )*)
的交替,以便捕获组\\1
始终捕获某些内容 (甚至是空字符串。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.