如何在python上使用正则表达式获取链接的特定部分

Question

I'm inputting a lot of links from linked-in profiles into a program that will get me the id of those linked-in profiles. 我正在从链接的配置文件中输入很多链接到一个程序中，该程序将为我提供这些链接的配置文件的ID。 (Links are the strings, clicking on most of them will take you nowhere) （链接是字符串，单击它们中的大多数都将使您无路可走）

Example 1: " https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/ " 示例1：“ https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/ ”

Example 2: " https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext " 示例2：“ https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext ”

If I input either of those examples the result will be: "facundo-b-barber%C3%A1-86bb41187" The problem I run into is when I have something like this: 如果输入这些示例中的任何一个，结果将是：“ facundo-b-barber％C3％A1-86bb41187”我遇到的问题是当我遇到类似这样的情况时：

Example 3: " https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext/anothertext/ " 示例3：“ https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext/anothertext/ ”

Where the output is: "facundo-b-barber%C3%A1-86bb41187/sometext" 输出为：“ facundo-b-barber％C3％A1-86bb41187 / sometext”

I've tried using re module in this function: 我试过在此功能中使用re模块：

def get_in(url):
    parsed = parse.urlparse(url)
    lin = parsed.path
    lin = re.search(r'/in/(.*)/', lin).group(1)
    print(lin)
    return lin

I want to get the id only and remove everything else in front and behind. 我只想获取ID，然后删除前面和后面的所有其他内容。

Answer 1

This should work -> 这应该工作->

url.split('/')[4]

Examples: 例子：

>>> url =  "https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext/anothertext/"
>>> url.split('/')[4]
'facundo-b-barber%C3%A1-86bb41187'

>>> url = "https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/sometext"
>>> url.split('/')[4]
'facundo-b-barber%C3%A1-86bb41187'

>>> url = "https://www.linkedin.com/in/facundo-b-barber%C3%A1-86bb41187/"
>>> url.split('/')[4]
'facundo-b-barber%C3%A1-86bb41187'

如何在python上使用正则表达式获取链接的特定部分

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-08-09 22:44:18

如何在python上使用正则表达式获取链接的特定部分

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-08-09 22:44:18

解决方案1
1 已采纳 2019-08-09 22:44:18