[英]Remove string after slash with condition
I'd like to remove the second part in a phrase as long as it is longer than 3 characters (letters and numbers) and add space if the characters are 3 or less.我想删除短语中的第二部分,只要它长于 3 个字符(字母和数字),如果字符为 3 个或更少,则添加空格。
In the following test set:在以下测试集中:
CENTRAL CARE HOSPITAL/HOPITAL CENTRALE DE SOINS
ABC/DEF
FOUNDATION INSTITUTION/FUNDATION DEL INSTITUTO
HAPPY SPRING BREAK 20/20
The result should be :结果应该是:
CENTRAL CARE HOSPITAL
ABC DEF
FOUNDATION INSTITUTION
HAPPY SPRING BREAK 20 20
My first try was this:我的第一次尝试是这样的:
([^\/]+$)
However, all the strings after the slash are gone because it is lacking of any restriction.但是,斜线后面的所有字符串都没有了,因为它没有任何限制。 I need to include a negative lookforward stating that I need to remove strings when they have more than 3 characters after the slash:我需要包含一个否定的前瞻,说明当斜杠后有超过 3 个字符时,我需要删除字符串:
text= re.sub(r'(^[^\/]+)(?:[\/])(?![A-Z]{3})',
r'\1 ',
text,
0,
re.IGNORECASE)
I am getting the following which is incorrect:我得到以下不正确的信息:
CENTRAL CARE HOSPITAL/HOPITAL CENTRALE DE SOINS
ABC DEF
FOUNDATION INSTITUTION/FUNDATION DEL INSTITUTO
HAPPY SPRING BREAK 20 20
How can I get rid of the slash and string in front of?我怎样才能摆脱前面的斜线和字符串?
Thanks谢谢
You could use 2 capturing groups to capture 1-3 chars AZ or digits before and after the /
and use those groups in the replacement with a space in between.您可以使用 2 个捕获组来捕获/
前后的 1-3 个字符 AZ 或数字,并在替换中使用这些组,中间有一个空格。
Use an alternation to match a forward slash followed by the rest of the sting to be removed.使用交替匹配正斜杠,然后是要删除的其余部分。
\b([A-Z0-9]{1,3})/([A-Z0-9]{1,3})\b|/.*
In the replacement use the 2 capturing groups在替换中使用 2 个捕获组
r"\1 \2"
Explanation解释
\\b
Word boundary \\b
词边界([A-Z0-9]{1,3})
Capture group 1 , match 1-3 times AZ or a digit ([A-Z0-9]{1,3})
捕获第 1 组,匹配 1-3 次 AZ 或一个数字/
Match literally /
字面匹配([A-Z0-9]{1,3})
Capture group 2 , match 1-3 times AZ or a digit ([A-Z0-9]{1,3})
捕获第 2 组,匹配 1-3 次 AZ 或一个数字\\b
Word boundary \\b
词边界|
Or或者/.*
Match /
and 0+ times any char except a newline /.*
匹配/
和 0+ 次除换行符以外的任何字符Regex demo |正则表达式演示| Python demo Python 演示
Example code示例代码
import re
regex = r"\b([A-Z0-9]{1,3})/([A-Z0-9]{1,3})\b|/.*"
text = ("CENTRAL CARE HOSPITAL/HOPITAL CENTRALE DE SOINS\n"
"ABC/DEF\n"
"FOUNDATION INSTITUTION/FUNDATION DEL INSTITUTO\n"
"HAPPY SPRING BREAK 20/20")
result = re.sub(regex, r"\1 \2", text)
print (result)
Output输出
CENTRAL CARE HOSPITAL
ABC DEF
FOUNDATION INSTITUTION
HAPPY SPRING BREAK 20 20
Do you have to use regexes?你必须使用正则表达式吗? Whats wrong with doing it like this?这样做有什么问题?
tests = [
"CENTRAL CARE HOSPITAL/HOPITAL CENTRALE DE SOINS",
"ABC/DEF",
"FOUNDATION INSTITUTION/FUNDATION DEL INSTITUTO",
"HAPPY SPRING BREAK 20/20"
]
for test in tests:
separate = test.split("/", 1)
print(separate[0] if len(separate[1])>3 else test)
Try this regex pattern:试试这个正则表达式模式:
text= ["CENTRAL CARE HOSPITAL/HOPITAL CENTRALE DE SOINS ",
"ABC/DEF",
"FOUNDATION INSTITUTION/FUNDATION DEL INSTITUTO",
"HAPPY SPRING BREAK 20/20"]
for element in text:
str_res = re.sub(r'(?:[\/])([A-Z0-9]{0,3}\b)|[^\/]*$',
r' \1',
element,
0,
re.IGNORECASE)
print(str_res)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.