简体   繁体   English

正则表达式:当字符串多于一个单词时删除斜杠后的字符串

[英]Regex: remove strings after slash just when they are more than one word

How to remove string after slash just when there are more than one word in the string?当字符串中有多个单词时,如何在斜杠后删除字符串? In specific, consider the following string:具体来说,请考虑以下字符串:

    0      1     2        0       1      2   3   
 CENTRAL CARE HOSPITAL/HOPITAL CENTRALE DE SOINS

All the characters after slash should be removed because there are 4 words (HOPITAL, CENTRALE, DE, SOINS) and the limit is just one.斜线后的所有字符都应该删除,因为有 4 个单词(HOPITAL、CENTRALE、DE、SOINS)并且限制只有一个。 Then the result is: CENTRAL CARE HOSPITAL那么结果是: CENTRAL CARE HOSPITAL

On the other hand, we have the following string:另一方面,我们有以下字符串:

   0     1     2    3  0
HAPPY SPRING BREAK 20/20

20 this time has to be kept because it is just one word ( \\b[A-Za-z0-9]\\b ). 20这次必须保留,因为它只是一个单词( \\b[A-Za-z0-9]\\b )。 Then, the / slash should be replaced by empty space.然后, /斜杠应替换为空格。 The result should look like the following: HAPPY SPRING BREAK 20 20结果应如下所示: HAPPY SPRING BREAK 20 20

Suppose the following test set:假设有以下测试集:

CENTRAL CARE HOSPITAL/HOPITAL CENTRALE DE SOINS
ELEMENTARY/INSTITUTION
FOUNDATION INSTITUTION/FUNDATION DEL INSTITUTO
HAPPY SPRING BREAK 20/20

The result should be the following:结果应如下所示:

CENTRAL CARE HOSPITAL
ELEMENTARY INSTITUTION
FOUNDATION INSTITUTION
HAPPY SPRING BREAK 20 20

Overall, just keep the strings after slash just when it is one word and add an space where the slash was located.总的来说,只要在斜线是一个单词时将字符串保留在斜线之后,并在斜线所在的位置添加一个空格。 Otherwise, remove the strings after slash否则,删除斜线后的字符串

I have tried this regex so far, but not working: (?:[\\/])([A-Z0-9]*\\b)(?!\\b[AZ]*)|[^\\/]*$到目前为止,我已经尝试过这个正则表达式,但没有用: (?:[\\/])([A-Z0-9]*\\b)(?!\\b[AZ]*)|[^\\/]*$

Thanks谢谢

You may use您可以使用

import re
rx = r'/(\w+(?:\W+\w+)+\W*$)?'
strs = ['CENTRAL CARE HOSPITAL/HOPITAL CENTRALE DE SOINS','ELEMENTARY/INSTITUTION','FOUNDATION INSTITUTION/FUNDATION DEL INSTITUTO','HAPPY SPRING BREAK 20/20']
for s in strs:
    print( re.sub(rx, lambda x: "" if x.group(1) else " ", s) )

See the Python demo online .在线查看Python 演示 Output:输出:

CENTRAL CARE HOSPITAL
ELEMENTARY INSTITUTION
FOUNDATION INSTITUTION
HAPPY SPRING BREAK 20 20

The regex is /(\\w+(?:\\W+\\w+)+\\W*$)?正则表达式是/(\\w+(?:\\W+\\w+)+\\W*$)? , see its online demo . ,查看其在线演示 It matches:它匹配:

  • / - a slash / - 斜线
  • (\\w+(?:\\W+\\w+)+\\W*$)? - an optional capturing group #1 that matches - 匹配的可选捕获组 #1
    • \\w+ - 1+ word chars \\w+ - 1+ 个字字符
    • (?:\\W+\\w+)+ - 1+ sequences of 1+ non-word chars followed with 1+ word chars (?:\\W+\\w+)+ - 1+ 序列 1+ 非字字符后跟 1+ 字字符
    • \\W* - zero or more non-word chars \\W* - 零个或多个非单词字符
    • $ - end of string. $ - 字符串的结尾。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM