[英]Regex to extract Organization names in python
SAMPLE PROGRAM 范例程序
import re
demostr = "Department of Microbiology and Immunology. Faculty of Tropical Medicine, Mahidol University, Electronic address: pornsawan.lea@mahidol.ac.th."
org = re.search(r"([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*(?=,|\d)", demostr).group()
print(org)
OUTPUT 输出值
Department of Microbiology and Immunology. Faculty of Tropical Medicine
The program extracts Organization, Department from the given string. 该程序从给定的字符串中提取Organization,Department。 It works fine if there is
,
after Immunology
. 如果它工作正常
,
后Immunology
。 but when in cases there is a dot .
但是如果出现点的话
.
after Organization it extracts wrong output. 组织后,它将提取错误的输出。 The required output is shown below-
所需的输出如下所示-
EXPECTED OUTPUT 预期的输出
Department of Microbiology and Immunology
You two things in your regex this will work fine 您在正则表达式中的两件事可以正常工作
([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*?(?=,|\.|\d)
Thing you have missed 你错过的东西
.*
- This is greedy in nature you need to make it lazy because of your requirement. .*
-本质上是贪婪的,由于您的需要,您需要使其变得懒惰。 \\.
- You didn't included .
.
in your alternation. Code
码
import re
demostr = "Department of Microbiology and Immunology. Faculty of Tropical Medicine, Mahidol University, Electronic address: pornsawan.lea@mahidol.ac.th."
org = re.search(r"([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*?(?=,|\.|\d)", demostr).group(0)
print(org)
Please try below code. 请尝试以下代码。
import re
demostr = "Department of Microbiology and Immunology. Faculty of Tropical Medicine, Mahidol University, Electronic address: pornsawan.lea@mahidol.ac.th."
org = re.search(r"([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*?(?=,|\.|\d)", demostr).group(0)
print(org)
Output 输出量
Department of Microbiology and Immunology
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.