简体   繁体   English

如何在python上使用正则表达式获取链接的特定部分

[英]How to get an especific part of a link with regex on python

I'm inputting a lot of links from linked-in profiles into a program that will get me the id of those linked-in profiles. 我正在从链接的配置文件中输入很多链接到一个程序中,该程序将为我提供这些链接的配置文件的ID。 (Links are the strings, clicking on most of them will take you nowhere) (链接是字符串,单击它们中的大多数都将使您无路可走)

Example 1: " https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/ " 示例1:“ https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/

Example 2: " https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext " 示例2:“ https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext

If I input either of those examples the result will be: "facundo-b-barber%C3%A1-86bb41187" The problem I run into is when I have something like this: 如果输入这些示例中的任何一个,结果将是:“ facundo-b-barber%C3%A1-86bb41187”我遇到的问题是当我遇到类似这样的情况时:

Example 3: " https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext/anothertext/ " 示例3:“ https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext/anothertext/

Where the output is: "facundo-b-barber%C3%A1-86bb41187/sometext" 输出为:“ facundo-b-barber%C3%A1-86bb41187 / sometext”

I've tried using re module in this function: 我试过在此功能中使用re模块:

def get_in(url):
    parsed = parse.urlparse(url)
    lin = parsed.path
    lin = re.search(r'/in/(.*)/', lin).group(1)
    print(lin)
    return lin

I want to get the id only and remove everything else in front and behind. 我只想获取ID,然后删除前面和后面的所有其他内容。

This should work -> 这应该工作->

url.split('/')[4]

Examples: 例子:

>>> url =  "https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext/anothertext/"
>>> url.split('/')[4]
'facundo-b-barber%C3%A1-86bb41187'

>>> url = "https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext"
>>> url.split('/')[4]
'facundo-b-barber%C3%A1-86bb41187'

>>> url = "https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/"
>>> url.split('/')[4]
'facundo-b-barber%C3%A1-86bb41187'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM