简体   繁体   中英

Python Selenium: Can't Get HREF Link Off Instagram in <time> tags

PostLinkExtraction = driver.find_element_by_xpath("//article[1]/div[3]/div[1]/div/div[2]/div[1][*[local-name()='a']]").get_attribute('href')
print (PostLinkExtraction)

Im trying to print the href link from the Time Stamp on Instagram under the first post on my Instagram Timeline. The code above returns none for some reason. Below is the code for anyone who wants to run it and see where I may have went wrong, but the overall goal I want to accomplish is to extract the href link from the <-time> tags. Below is an image of where the <-time> tags will be in developer tools

在此处输入图像描述

from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from time import sleep
from selenium.webdriver.common.keys import Keys
from selenium import webdriver

user = 'username'
passw = 'password'



driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://www.instagram.com/')
driver.implicitly_wait(10)

driver.find_element_by_name('username').send_keys(user)
driver.find_element_by_name('password').send_keys(passw)
Login = "//button[@type='submit']"
sleep(2)
driver.find_element_by_xpath(Login).submit()
sleep(1)
# Logs into Instagram
print ('Logged In')

#------------------------ATTENTION

NotNow = "//button[contains(text(),'Not Now')]"
driver.find_element_by_xpath(NotNow).click()
# Clicks Pop Up
print ('Close Pop Up')

# It's weird but the pop up opens once, only after this page.
# If ever a problem delete one, or have the first click be
# directed to your Instagram Profiles timeline

NotNow = "//button[contains(text(),'Not Now')]"
driver.find_element_by_xpath(NotNow).click()
#Clicks Pop Up; Comment out the line above if it causes an error
print ('Close Pop Up')

#-----------------------------------



driver.refresh()
print ('refreshing')
driver.implicitly_wait(10)
PostLinkExtraction = driver.find_element_by_xpath("//article[1]/div[3]/div[1]/div/div[2]/div[1][*[local-name()='a']]").get_attribute('href')
print (PostLinkExtraction)

I find out the issue is because of your xpath. Fix it and you will print out the href of your first post.

PostLinkExtraction = driver.find_element_by_xpath("//article[1]/div[3]/div[1]/div/div[2]/div[1]/a").get_attribute('href')
print (PostLinkExtraction)

The result:

在此处输入图像描述

Short Answer: Stop sticking to xpaths and find the elements you're looking for in this way: 1 - put all the elements with the same tag in an array

2 - search for two-three attributes that renders it unique

3- extract it cycling in the array and use it

Easy, fast and clean.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM