简体   繁体   English

如何在 Notepad++ 中将连字符与正则表达式合并?

[英]How to merge hyphenated words with regex in Notepad++?

I have numerous OCR-ed texts with hyphenated words in the middle of lines.我有许多 OCR 编辑文本,行中间带有连字符。

Example: This is a text with a hyphen- ated word in the middle of the sentence.示例:这是一个在句子中间带有hyphen- ated的文本。 But it also has - dashes - like the ones in the second sentence.但它也有- dashes -就像第二句话中的一样。 The latter should not be modified.后者不应修改。

I would like to have a cleaned text like the one below where the hyphenated words are merged:我想要一个像下面这样的干净的文本,其中连字符被合并:

This is a text with a hyphenated word in the middle.这是一个中间带有连字符的文本。 But it also has - dashes - like the ones in the second sentence.但它也有 - 破折号 - 就像第二句话中的一样。 The latter should not be modified.后者不应修改。

By removing the hyphen, this -\s*\r?\n\s*\r?\n?通过删除连字符,这-\s*\r?\n\s*\r?\n? regex merges the hyphenated words if the hyphen is located at the end of the lines.如果连字符位于行尾,则正则表达式会合并带连字符的单词。 How to modify this regex to do the above job?如何修改这个正则表达式来完成上述工作? The number of spaces after the hyphen can be 1, 2 or 3 like hyphen- ated , hyphen- ated , hyphen- ated .连字符后的空格数可以是 1、2 或 3,例如hyphen- atedhyphen- atedhyphen- ated

You can look for a non-space (the end of a word) followed by - :您可以查找后跟-的非空格(单词的结尾):

([^\s\d])(-\s+)

Then simply replace with $1 to leave the last character of the word intact.然后简单地替换为$1以保持单词的最后一个字符不变。

Here is a working example on regex101.com:这是一个关于 regex101.com 的工作示例:
https://regex101.com/r/Zl7lvR/1 https://regex101.com/r/Zl7lvR/1

Using notepad++ you can use thia pattern and replace with an empty string:使用记事本++,您可以使用 thia 模式并替换为空字符串:

[^\s-]\K-\s{1,3}

The pattern matches:模式匹配:

  • [^\s-] Match a single char other than - or a whitespace char [^\s-]匹配除 - 或空白字符之外的单个字符
  • \K Forget what is matched so far \K忘记到目前为止匹配的内容
  • -\s{1,3} Match - and 1-3 whitespace chars to be removed -\s{1,3}匹配-和要删除的 1-3 个空白字符

Regex demo正则表达式演示

Another variant matching 1+ whitespace chars and asserting a single char other than - or a whitespace char to the right另一个变体匹配 1+ 个空白字符并在右侧声明一个除-以外的单个字符或一个空白字符

[^\s-]\K-\s+(?=[^\s-])

Regex demo正则表达式演示

Or with the 1-3 quantifier and the lookahead:或者使用 1-3 量词和前瞻:

[^\s-]\K-\s{1,3}(?=[^\s-])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM