繁体   English   中英

如果单词以 python 中的某些关键字开头,如何查找特定字符串?

[英]How to find specific string if word start with some keywords in python?

我有字符串列表,必须从字符串中查找特定文本。 例子

L1=["Address:S/O: Puran Mal Saini, xxxxxxxxxx,Pxxxxxxxx, Palam Vilxxxxxxiage,Palam",
    "Address:S/O Radheyshyam Sharma, E SECOND",
    "Address:S/O: Saroj Shahi, gram-shyampur",
    "Address:S/O Birjraj Singh, Cccxxxx, NEW Azzzzzzz,",
    "Address:208027 S/O: Naresh Chandra Mishra",
    "Address: C/O: Mayenk Jain. 260/18, Axxxxxxxr, Opp. Haxxxx xxxxxr, Gxxxxxa",
    "Address:208027S/O: Naresh Chandra Mishra,Wxxx, 127/406",
    "Address: C/O Sachin Vasant Shivaji Vidhyalay, Sissssss"]

我的愿望 output

 Puran Mal Saini,
 Radheyshyam Sharma
 Saroj Shahi
 Birjraj Singh
 Naresh Chandra Mishra
 Mayenk Jain
 Naresh Chandra Mishra
 Sachin Vasant Shivaji Vidhyalay

我努力了

import re
for wordlist in L1:
  xx=wordlist.split()
  for w in xx:
     if re.search('(SO|S/O|S/O:|W/O|W/O:|C/O|D/O|WO|CO|DO)$',w):
        name=re.findall("[a-zA-Z:/ ]+", wordlist)
        print(max(name))

尝试稍微简化您的正则表达式并使用列表理解:

pattern = re.compile(r'[CDSW]\/?O:?\s([\w\s]*)')    
results = [re.search(pattern, text).group(1) for text in L1]

没有正则表达式的解决方案:

L1 = [item.split(',')[0].split('/O')[-1].split(':')[-1].strip() for item in L1]

这个怎么运作?

  1. 首先,根据 拆分,提取 first 之前的字符串,
  2. 然后根据\O拆分并提取最后一个字符串,其中包含以下内容: Puran Mal SainiO Radheyshyam Sharma
  3. 然后根据:拆分并提取最后一个字符串。 如果没有: ,它会按原样返回字符串。
  4. 最后,为了去掉一些空格,你可以使用strip()来得到想要的结果。

欢迎来到 SO。 从提供的文本数据中,我可以看到/和逗号,是一致的。 这可用于提取基于名称的split()和重新pattern匹配。

输入:

L1=["Address:S/O: Puran Mal Saini, xxxxxxxxxx,Pxxxxxxxx, Palam Vilxxxxxxiage,Palam",
    "Address:S/O Radheyshyam Sharma, E SECOND",
    "Address:S/O: Saroj Shahi, gram-shyampur",
    "Address:S/O Birjraj Singh, Cccxxxx, NEW Azzzzzzz,",
    "Address:208027 S/O: Naresh Chandra Mishra",
    "Address: C/O: Mayenk Jain. 260/18, Axxxxxxxr, Opp. Haxxxx xxxxxr, Gxxxxxa",
    "Address:208027S/O: Naresh Chandra Mishra,Wxxx, 127/406",
    "Address: C/O Sachin Vasant Shivaji Vidhyalay, Sissssss"]

代码:

import re

def return_names(text, pattern):
    # First split by comma and then / which is consistent throughout the text
    text = text.split(",")[0].split('/')[1]
    # sub will replace all with '' for a match
    name = re.sub(pattern, '', text)
    return name.strip()

# The current replacing pattern, this can be adapted accordingly
pattern = "[0-9O:.]"
for text in L1:
    print(return_names(text, pattern))

输出:

$ python3 pattern.py
Puran Mal Saini
Radheyshyam Sharma
Saroj Shahi
Birjraj Singh
Naresh Chandra Mishra
Mayenk Jain
Naresh Chandra Mishra
Sachin Vasant Shivaji Vidhyalay

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM