带条件的斜杠后删除字符串

Question

I'd like to remove the second part in a phrase as long as it is longer than 3 characters (letters and numbers) and add space if the characters are 3 or less.我想删除短语中的第二部分，只要它长于 3 个字符（字母和数字），如果字符为 3 个或更少，则添加空格。

In the following test set:在以下测试集中：

CENTRAL CARE HOSPITAL/HOPITAL CENTRALE DE SOINS
ABC/DEF
FOUNDATION INSTITUTION/FUNDATION DEL INSTITUTO
HAPPY SPRING BREAK 20/20

The result should be :结果应该是：

CENTRAL CARE HOSPITAL
ABC DEF
FOUNDATION INSTITUTION
HAPPY SPRING BREAK 20 20

My first try was this:我的第一次尝试是这样的：

([^\/]+$)

However, all the strings after the slash are gone because it is lacking of any restriction.但是，斜线后面的所有字符串都没有了，因为它没有任何限制。 I need to include a negative lookforward stating that I need to remove strings when they have more than 3 characters after the slash:我需要包含一个否定的前瞻，说明当斜杠后有超过 3 个字符时，我需要删除字符串：

text= re.sub(r'(^[^\/]+)(?:[\/])(?![A-Z]{3})',
             r'\1 ',
             text,
             0,
             re.IGNORECASE)

I am getting the following which is incorrect:我得到以下不正确的信息：

CENTRAL CARE HOSPITAL/HOPITAL CENTRALE DE SOINS 
ABC DEF
FOUNDATION INSTITUTION/FUNDATION DEL INSTITUTO 
HAPPY SPRING BREAK 20 20

How can I get rid of the slash and string in front of?我怎样才能摆脱前面的斜线和字符串？

Thanks谢谢

Answer 1

You could use 2 capturing groups to capture 1-3 chars AZ or digits before and after the / and use those groups in the replacement with a space in between.您可以使用 2 个捕获组来捕获/前后的 1-3 个字符 AZ 或数字，并在替换中使用这些组，中间有一个空格。

Use an alternation to match a forward slash followed by the rest of the sting to be removed.使用交替匹配正斜杠，然后是要删除的其余部分。

\b([A-Z0-9]{1,3})/([A-Z0-9]{1,3})\b|/.*

In the replacement use the 2 capturing groups在替换中使用 2 个捕获组

r"\1 \2"

Explanation解释

\\b Word boundary \\b词边界
([A-Z0-9]{1,3}) Capture group 1 , match 1-3 times AZ or a digit ([A-Z0-9]{1,3})捕获第 1 组，匹配 1-3 次 AZ 或一个数字
/ Match literally /字面匹配
([A-Z0-9]{1,3}) Capture group 2 , match 1-3 times AZ or a digit ([A-Z0-9]{1,3})捕获第 2 组，匹配 1-3 次 AZ 或一个数字
\\b Word boundary \\b词边界
| Or或者
/.* Match / and 0+ times any char except a newline /.*匹配/和 0+ 次除换行符以外的任何字符

Regex demo |正则表达式演示| Python demo Python 演示

Example code示例代码

import re

regex = r"\b([A-Z0-9]{1,3})/([A-Z0-9]{1,3})\b|/.*"

text = ("CENTRAL CARE HOSPITAL/HOPITAL CENTRALE DE SOINS\n"
    "ABC/DEF\n"
    "FOUNDATION INSTITUTION/FUNDATION DEL INSTITUTO\n"
    "HAPPY SPRING BREAK 20/20")

result = re.sub(regex, r"\1 \2", text)
print (result)

Output输出

CENTRAL CARE HOSPITAL 
ABC DEF
FOUNDATION INSTITUTION 
HAPPY SPRING BREAK 20 20

Answer 2

Do you have to use regexes?你必须使用正则表达式吗？ Whats wrong with doing it like this?这样做有什么问题？

tests = [
    "CENTRAL CARE HOSPITAL/HOPITAL CENTRALE DE SOINS", 
    "ABC/DEF", 
    "FOUNDATION INSTITUTION/FUNDATION DEL INSTITUTO", 
    "HAPPY SPRING BREAK 20/20"
]

for test in tests:
    separate = test.split("/", 1)
    print(separate[0] if len(separate[1])>3 else test)

Answer 3

Try this regex pattern:试试这个正则表达式模式：

text= ["CENTRAL CARE HOSPITAL/HOPITAL CENTRALE DE SOINS ",
       "ABC/DEF",
       "FOUNDATION INSTITUTION/FUNDATION DEL INSTITUTO",
       "HAPPY SPRING BREAK 20/20"]

for element in text:
    str_res = re.sub(r'(?:[\/])([A-Z0-9]{0,3}\b)|[^\/]*$',
                     r' \1',
                     element,
                     0,
                     re.IGNORECASE)
    print(str_res)

带条件的斜杠后删除字符串

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-03-14 10:59:11

解决方案2
0 2020-03-13 18:50:47

解决方案3
0 2020-03-13 22:37:08

带条件的斜杠后删除字符串

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-03-14 10:59:11

解决方案2 0 2020-03-13 18:50:47

解决方案3 0 2020-03-13 22:37:08

解决方案1
1 已采纳 2020-03-14 10:59:11

解决方案2
0 2020-03-13 18:50:47

解决方案3
0 2020-03-13 22:37:08