[英]How to find specific string if word start with some keywords in python?
我有字符串列表,必须从字符串中查找特定文本。 例子
L1=["Address:S/O: Puran Mal Saini, xxxxxxxxxx,Pxxxxxxxx, Palam Vilxxxxxxiage,Palam",
"Address:S/O Radheyshyam Sharma, E SECOND",
"Address:S/O: Saroj Shahi, gram-shyampur",
"Address:S/O Birjraj Singh, Cccxxxx, NEW Azzzzzzz,",
"Address:208027 S/O: Naresh Chandra Mishra",
"Address: C/O: Mayenk Jain. 260/18, Axxxxxxxr, Opp. Haxxxx xxxxxr, Gxxxxxa",
"Address:208027S/O: Naresh Chandra Mishra,Wxxx, 127/406",
"Address: C/O Sachin Vasant Shivaji Vidhyalay, Sissssss"]
我的愿望 output
Puran Mal Saini,
Radheyshyam Sharma
Saroj Shahi
Birjraj Singh
Naresh Chandra Mishra
Mayenk Jain
Naresh Chandra Mishra
Sachin Vasant Shivaji Vidhyalay
我努力了
import re
for wordlist in L1:
xx=wordlist.split()
for w in xx:
if re.search('(SO|S/O|S/O:|W/O|W/O:|C/O|D/O|WO|CO|DO)$',w):
name=re.findall("[a-zA-Z:/ ]+", wordlist)
print(max(name))
尝试稍微简化您的正则表达式并使用列表理解:
pattern = re.compile(r'[CDSW]\/?O:?\s([\w\s]*)')
results = [re.search(pattern, text).group(1) for text in L1]
没有正则表达式的解决方案:
L1 = [item.split(',')[0].split('/O')[-1].split(':')[-1].strip() for item in L1]
这个怎么运作?
,
提取 first 之前的字符串,
\O
拆分并提取最后一个字符串,其中包含以下内容: Puran Mal Saini
或O Radheyshyam Sharma
。:
拆分并提取最后一个字符串。 如果没有:
,它会按原样返回字符串。strip()
来得到想要的结果。 欢迎来到 SO。 从提供的文本数据中,我可以看到/
和逗号,
是一致的。 这可用于提取基于名称的split()
和重新pattern
匹配。
输入:
L1=["Address:S/O: Puran Mal Saini, xxxxxxxxxx,Pxxxxxxxx, Palam Vilxxxxxxiage,Palam",
"Address:S/O Radheyshyam Sharma, E SECOND",
"Address:S/O: Saroj Shahi, gram-shyampur",
"Address:S/O Birjraj Singh, Cccxxxx, NEW Azzzzzzz,",
"Address:208027 S/O: Naresh Chandra Mishra",
"Address: C/O: Mayenk Jain. 260/18, Axxxxxxxr, Opp. Haxxxx xxxxxr, Gxxxxxa",
"Address:208027S/O: Naresh Chandra Mishra,Wxxx, 127/406",
"Address: C/O Sachin Vasant Shivaji Vidhyalay, Sissssss"]
代码:
import re
def return_names(text, pattern):
# First split by comma and then / which is consistent throughout the text
text = text.split(",")[0].split('/')[1]
# sub will replace all with '' for a match
name = re.sub(pattern, '', text)
return name.strip()
# The current replacing pattern, this can be adapted accordingly
pattern = "[0-9O:.]"
for text in L1:
print(return_names(text, pattern))
输出:
$ python3 pattern.py
Puran Mal Saini
Radheyshyam Sharma
Saroj Shahi
Birjraj Singh
Naresh Chandra Mishra
Mayenk Jain
Naresh Chandra Mishra
Sachin Vasant Shivaji Vidhyalay
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.