简体   繁体   English

python 正则表达式用于人名

[英]python regex for people names

hello i have tried to extract all the names from the following string:您好,我已尝试从以下字符串中提取所有名称:

import re
def Find(string):
    url = re.findall(r"[A-Z][a-z]+,?\s+(?:[A-Z][a-z]*\.?\s*)?[A-Z][a-z]+", string)
    return url
string = 'Arnold Schwarzenegger was born in Austria. He and Sylvester Stalone used to run a restaurant with J. Edgar Hoover.'
print(Find(string))

but i have got a problem with the output(doesnt print the J. on edgar)但我的输出有问题(没有在 edgar 上打印 J.)

['Arnold Schwarzenegger', 'Sylvester Stalone', 'Edgar Hoover']

another question for you:) i have tried to print the second string but i get a problem.给你的另一个问题:) 我试图打印第二个字符串,但我遇到了问题。 i need to write a regex that print it without www or http or https like in the example:我需要编写一个没有 www 或 http 或 https 的正则表达式,如示例中所示:

import re
def Find(string):
    url = re.findall(r'https?://[^\s<>"]+|www\.[^\s<>"]+', string)
    return url
string = 'To learn about pros/cons of data science, go to http://datascience.net. Alternatively, go to datascience.net/2020/'
print(Find(string))

output is: output 是:

['http://datascience.net.']

thanks谢谢

Question 1问题 1

Here's a regex that works for that specific case of three names:这是一个适用于三个名称的特定情况的正则表达式:

((?:[AZ]\.\s)?[AZ][az]+\s[AZ][az]+)

yields产量

Arnold Schwarzenegger
Sylvester Stalone
J. Edgar Hoover

Question 2问题2

(?:http)?s?(?:\:\/\/)?(?:www.)?([Az]+\.[Az]+(?:[\./][A-z0-9]+)*\/?)

yields产量

http://datascience.net
datascience.net/2020/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM