簡體   English   中英

Python Selenium - 提取文本<br>

[英]Python Selenium - Extract text within <br>

我目前正在遍歷所有標簽並從每個頁面中提取數據,但是我無法提取每個類別(即成立、位置等)下方突出顯示的文本。 文本似乎在“”br標簽上方,有人可以建議如何提取嗎?

網站 - https://labelsbase.net/knee-deep-in-sound

                        <div class="line-title-block">
                            <div class="line-title-wrap">
                                <span class="line-title-text">Founded</span>
                            </div>
                        </div>
                        2003
                        <br>


                        <div class="line-title-block">
                            <div class="line-title-wrap">
                                <span class="line-title-text">Location</span>
                            </div>
                        </div>

                        
                        <a href="/?c=United+Kingdom">United Kingdom</a>
                        <br>

我曾嘗試使用driver.find_elements_by_xpath & driver.execute_script但找不到解決方案。

錯誤信息 -

Message: invalid selector: The result of the xpath expression "/html/body/div[3]/div/div[1]/div[2]/div/div[1]/text()[2]" is: [object Text]. It should be an element.

截屏

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
import pandas as pd
import time
import string

PATH = '/Applications/chromedriver'
driver = webdriver.Chrome(PATH)
wait = WebDriverWait(driver, 10)
links = []
url = 'https://labelsbase.net/knee-deep-in-sound'
driver.get(url)

time.sleep(5)
# -- Title
title = driver.find_element_by_class_name('label-name').text
print(title,'\n')

# -- Image
image = driver.find_element_by_tag_name('img')
src = image.get_attribute('src')
print(src,'\n')

# -- Founded
founded = driver.find_element_by_xpath("/html/body/div[3]/div/div[1]/div[2]/div/div[1]/text()[2]").text
print(founded,'\n')

driver.quit()

你能檢查一下嗎

founded = driver.find_element_by_xpath("//*[@*='block-content']").get_attribute("innerText")

您可以采用class="block-content"的 XPath

開/關

在此處輸入圖片說明

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM