简体   繁体   English

什么是用于删除大写字母之间的空格的正则表达式,但在单词之间保留空格?

[英]What is a regular expression for removing spaces between uppercase letters, but keeps spaces between words?

For example, if I have a string like "Hello IBM", how do I detect the space between the uppercase letters but not between the "o" and the "I"? 例如,如果我有一个像“Hello IBM”这样的字符串,我如何检测大写字母之间的空格,而不是“o”和“I”之间的空格?

Basically "Hello IBM" should resolve to "Hello IBM" 基本上“Hello IBM”应解析为“Hello IBM”

So far, I have this: 到目前为止,我有这个:

value = "Hello I B M"
value = value.replace(/([A-Z])\s([A-Z])/g, '$1$2')

But it only replaces the first instance of a space between two uppercase letters like: "Hello IB M" 但它只替换了两个大写字母之间的第一个空格实例,如:“Hello IB M”

--EDIT-- - 编辑 -

Solution Part 1: 解决方案第1部分

 value = value.replace(/([A-Z])\s(?=[A-Z])/g, '$1')

Thanks to Renato for the first part of the solution! 感谢Renato解决方案的第一部分! Just found out if there is a capitalized word AFTER an uppercase letter, it swallows that space as well. 刚刚发现大写字母后面是否有大写单词,它也会吞下那个空格。 How do we preserver the space there? 我们如何保留那里的空间?

So "Hello IBM Dude" becomes "Hello IBMDude" instead of "Hello IBM Dude" 所以“Hello IBM Dude”成为“Hello IBMDude”而不是“Hello IBM Dude”

When the regex matches the first time (on "AB" ), this part of the string in consumed by the engine, so it is not matched again, even though your regex has the global ( 'g' ) flag. 当正则表达式匹配第一次(在"AB" )时,引擎消耗的字符串的这一部分,因此即使你的正则表达式具有全局( 'g' )标志,它也不会再次匹配。

You could achieve the expected result by using a positive lookahead ( (?=PATTERN) ) instead, that won't consume the match: 您可以通过使用正向前瞻( (?=PATTERN) )来实现预期结果,而不会消耗匹配:

value = "Hello I B M"
value = value.replace(/([A-Z])\s(?=[A-Z])/g, '$1')
console.log(value) // Prints "Hello IBM"

To make it not remove the space if the next uppercase letter is the first in a word, you can increment the lookahead pattern with using a word boundary \\b to make that restriction: 如果下一个大写字母是单词中的第一个字母,则不要删除空格,可以使用单词boundary \\b来增加先行模式以进行限制:

value = "Hello I B M Dude"
value = value.replace(/([A-Z])\s(?=[A-Z]\b)/g, '$1')
console.log(value) // Prints "Hello IBM Dude"

Note : As @CasimirHyppolite noted, the following letter has to be made optional, or the second regex won't work if the last character of the string is uppercase. 注意 :正如@CasimirHyppolite所指出的,以下字母必须是可选的,否则如果字符串的最后一个字符是大写的,则第二个正则表达式将不起作用。 Thus, the pattern ([^A-Za-z]|$) , which can be read as "not a letter, or the end of the string". 因此,模式([^A-Za-z]|$) ,可以读作“不是字母,或字符串的结尾”。

Edit : Simplify lookahead from (?=[AZ]([^A-Za-z]|$)) to (?=[AZ]\\b) as suggested by @hwnd 编辑 :按照@hwnd的建议从(?=[AZ]([^A-Za-z]|$))(?=[AZ]\\b) 简化前瞻

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM