简体   繁体   中英

Pull profile name from LinkedIn URL in Python

I'm trying to pull the profile name from the following URL: https://www.linkedin.com/in/zamenajaffer/

Ideally, I want to extract the "zamenajaffer" from the URL and convert it to string.

Here is what I have so far:

#importing packages for web scraping
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import re
import time

### Opening LinkedIn Account ###
#request user input for LinkedIn credentials
print("Please enter your email address: ")
username_string = str(input())
print("Please enter your password: ")
password_string = str(input())

#create browser-specific web navigation simulator (chrome)
browser = webdriver.Chrome(executable_path= '/Applications/Python 3.8/chromedriver')

#open LinkedIn and log in with given details
browser.get('https://www.linkedin.com/login')
elementID = browser.find_element_by_id('username')
elementID.send_keys(username_string)
elementID = browser.find_element_by_id('password')
elementID.send_keys(password_string)
elementID.submit()

#navigate to recent activity page
browser.get('https://www.linkedin.com/in/')
print(browser.current_url)

It currently prints https://www.linkedin.com/in/ . What I want it to print is https://www.linkedin.com/in/zamenajaffer/ , as is shown in the browser when the code runs: chrome浏览器中的网址截图

You have to add delay while page is loaded and only after that to print(browser.current_url)
So you can add

from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 30)
element = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'live-video-hero-image')))   

And then

print(browser.current_url)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM