[英]How to get an especific part of a link with regex on python
I'm inputting a lot of links from linked-in profiles into a program that will get me the id of those linked-in profiles. 我正在从链接的配置文件中输入很多链接到一个程序中,该程序将为我提供这些链接的配置文件的ID。 (Links are the strings, clicking on most of them will take you nowhere) (链接是字符串,单击它们中的大多数都将使您无路可走)
Example 1: " https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/ " 示例1:“ https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/ ”
Example 2: " https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext " 示例2:“ https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext ”
If I input either of those examples the result will be: "facundo-b-barber%C3%A1-86bb41187" The problem I run into is when I have something like this: 如果输入这些示例中的任何一个,结果将是:“ facundo-b-barber%C3%A1-86bb41187”我遇到的问题是当我遇到类似这样的情况时:
Example 3: " https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext/anothertext/ " 示例3:“ https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext/anothertext/ ”
Where the output is: "facundo-b-barber%C3%A1-86bb41187/sometext" 输出为:“ facundo-b-barber%C3%A1-86bb41187 / sometext”
I've tried using re module in this function: 我试过在此功能中使用re模块:
def get_in(url):
parsed = parse.urlparse(url)
lin = parsed.path
lin = re.search(r'/in/(.*)/', lin).group(1)
print(lin)
return lin
I want to get the id only and remove everything else in front and behind. 我只想获取ID,然后删除前面和后面的所有其他内容。
This should work -> 这应该工作->
url.split('/')[4]
Examples: 例子:
>>> url = "https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext/anothertext/"
>>> url.split('/')[4]
'facundo-b-barber%C3%A1-86bb41187'
>>> url = "https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext"
>>> url.split('/')[4]
'facundo-b-barber%C3%A1-86bb41187'
>>> url = "https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/"
>>> url.split('/')[4]
'facundo-b-barber%C3%A1-86bb41187'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.