简体   繁体   English

正则表达式在python中提取组织名称

[英]Regex to extract Organization names in python

SAMPLE PROGRAM 范例程序

import re

demostr = "Department of Microbiology and Immunology. Faculty of Tropical Medicine, Mahidol University, Electronic address: pornsawan.lea@mahidol.ac.th."
org = re.search(r"([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*(?=,|\d)", demostr).group()
print(org)   

OUTPUT 输出值

Department of Microbiology and Immunology. Faculty of Tropical Medicine

The program extracts Organization, Department from the given string. 该程序从给定的字符串中提取Organization,Department。 It works fine if there is , after Immunology . 如果它工作正常,Immunology but when in cases there is a dot . 但是如果出现点的话. after Organization it extracts wrong output. 组织后,它将提取错误的输出。 The required output is shown below- 所需的输出如下所示-

EXPECTED OUTPUT 预期的输出

Department of Microbiology and Immunology

You two things in your regex this will work fine 您在正则表达式中的两件事可以正常工作

([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*?(?=,|\.|\d)

Thing you have missed 你错过的东西

  • .* - This is greedy in nature you need to make it lazy because of your requirement. .* -本质上是贪婪的,由于您的需要,您需要使其变得懒惰。
  • \\. - You didn't included . -你不包括在内. in your alternation. 在你的交替中。

Code

    import re

    demostr = "Department of Microbiology and Immunology. Faculty of Tropical Medicine, Mahidol University, Electronic address: pornsawan.lea@mahidol.ac.th."
    org = re.search(r"([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*?(?=,|\.|\d)", demostr).group(0)
    print(org) 

Demo 演示版

Please try below code. 请尝试以下代码。

import re

demostr = "Department of Microbiology and Immunology. Faculty of Tropical Medicine, Mahidol University, Electronic address: pornsawan.lea@mahidol.ac.th."
org = re.search(r"([A-Z][^\s,.]+[.]?\s[(]?)*(Dept|Association|Office|University|Department)[^,\d]*?(?=,|\.|\d)", demostr).group(0)
print(org)  

Output 输出量

Department of Microbiology and Immunology

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM