無法使用 xpath 找到 img 元素

Question

誰能告訴我為什么下面的代碼不會返回表情符號屬性...

from selenium.webdriver import Chrome
import time
from selenium.common.exceptions import NoSuchElementException
import re    

# open webpage and allow time to load entirely
driver = Chrome()
driver.implicitly_wait(15)
driver.get("https://twitter.com")
time.sleep(2)

# start scraping tweets
tickerOptDetails = []
tweet_ids = set()
tweet_ids.clear()
print(tweet_ids)


def main():

    # prevent computer from going to sleep
    pyautogui.press('shift')

    print("--checking for new alert...")
    page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]')

    for card in page_cards:
        try:
            ticker = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]').text.replace('$', '')
            optCriteria = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]'
                                                     '/../following-sibling::span').text.split('\n')[0]\
                .replace('-', '').replace('$', '')
            emoji = card.find_element_by_xpath("//img[contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f402.svg')"
                                               " or contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f43b.svg')]")\
                .get_attribute("title")
            

            tradeCriteria = str(ticker+optCriteria)
        except NoSuchElementException:
            continue

       if tradeCriteria:
            tweet_id = ' '.join(tradeCriteria)
            if tweet_id not in tweet_ids:
                 tweet_ids.add(tweet_id)
                 if 13 < len(tradeCriteria) < 22 and re.search(r'\d{8} \D ', tradeCriteria):

                      print(tradeCriteria)
                      print(emoji)

main()

但是下面的代碼將返回一個表情符號屬性......

from selenium.webdriver import Chrome
import time
from selenium.common.exceptions import NoSuchElementException
import re


# open webpage and allow time to load entirely
driver = Chrome()
driver.get("https://twitter.com")
time.sleep(2)

# start scraping tweets
tickerOptDetails = []
emojiSet = []
tweet_ids = set()
last_position = driver.execute_script("return window.pageYOffset;")
scrolling = True
tweet_ids.clear()
print(tweet_ids)
page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]')

while scrolling:
    page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]')
    for card in page_cards:
        try:
            ticker = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]').text.replace('$', '')
            optCriteria = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]'
                                                     '/../following-sibling::span').text.split('\n')[0].replace('-', '').replace('$', '')
            emoji = card.find_element_by_xpath("//img[contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f402.svg') or"
                                               " contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f43b.svg')]")\
                .get_attribute("title")
            
            tradeCriteria = str(ticker+optCriteria)
        except NoSuchElementException:
            continue

        if tradeCriteria:
            tweet_id = ''.join(tradeCriteria)
            if tweet_id not in tweet_ids:
                tweet_ids.add(tweet_id)
                if 13 < len(tradeCriteria) < 22 and re.search(r'\d{8} \D ', tradeCriteria):

                    print(tradeCriteria)
                    print(emoji)

    scroll_attempt = 0
    while True:
        # check scroll position
        driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
        time.sleep(2)
        curr_position = driver.execute_script("return window.pageYOffset;")
        if last_position == curr_position:
            scroll_attempt += 1

            if scroll_attempt >= 3:
                scrolling = False
                break
            else:
                time.sleep(2)
        else:
            last_position = curr_position
            break

print(tweet_ids)

我知道我已經將滾動添加到第二個代碼，所以它正在查看整個頁面並返回我正在尋找的元素。 但除此之外，它們或多或少是相同的。 我可以每隔幾秒鍾運行第一個代碼，它永遠找不到表情符號元素。 它會發現ticker和optCriteria沒有問題並將它們一起打印為tradeCriteria，但即使它在那里也永遠找不到表情符號屬性。

我嘗試了隱式等待和顯式等待，但都沒有奏效。 我還嘗試在 if 語句if 13 < len(tradeCriteria) < 22 and re.search(r'\d{8} \D ', tradeCriteria):中使用表情符號 xpath 行，但這也不起作用。

Answer 1

將代碼插入比較檢查器后，第 38 行和第 43 行之間似乎分別缺少一個空格。

43: tweet_id = ' '.join(tradeCriteria)
38: tweet_id = ''.join(tradeCriteria)

這個空格導致在加入時tradeCriteria列表中的每個元素之間都有一個空格。

43： ab c

38： abc

if tweet_id not in tweet_ids:在這兩個文件中，看看print(emoji)語句是如何出現的，我認為這種差異是導致第一個文件出現問題的原因。

Alternatively, if you are scraping data from twitter, you can try using the official Twitter API with a python wrapper such as Tweepy as it is slightly easier. 您可以在此處了解有關如何執行此操作的更多信息。

無法使用 xpath 找到 img 元素

問題描述

1 個解決方案

解決方案1
0 2022-08-19 21:14:54

無法使用 xpath 找到 img 元素

問題描述

1 個解決方案

解決方案1 0 2022-08-19 21:14:54

解決方案1
0 2022-08-19 21:14:54