[英]Trouble to scrape data from a site using Python
我正在尝试使用 Python 从一行中抓取文本。 我能够从同一行获得 class 属性,但不是文本,尝试了.text
和.get_text()
,但它们都不起作用。
我错过了什么?
这是我的 Python 脚本,用于从行中获取文本:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time
import datetime
import csv
class toy(object):
browser = webdriver.Chrome(ChromeDriverManager().install())
browser.get('https://continuumgames.com/product/16-tracer-racer-set/')
time.sleep(2)
try:
test = browser.find_element_by_xpath('//*[@id="tab-additional_information"]/table/tbody/tr[3]/td').get_attribute('class')
except:
test = 'NA'
try:
upcode = browser.find_element_by_xpath('//*[@id="tab-additional_information"]/table/tbody/tr[3]/td').text
except:
upcode = 'NA'
print(test)
print(upcode)
browser.close()
这是页面的 HTML:
<div class="woocommerce-Tabs-panel woocommerce-Tabs-panel--additional_information panel entry-content wc-tab" id="tab-additional_information" role="tabpanel" aria-labelledby="tab-title-additional_information" style="display: none;">
<table class="woocommerce-product-attributes shop_attributes">
<tbody>
<tr class="woocommerce-product-attributes-item woocommerce-product-attributes-item--weight">
<th class="woocommerce-product-attributes-item__label">Weight</th>
<td class="woocommerce-product-attributes-item__value">2.5 oz</td>
</tr>
<tr class="woocommerce-product-attributes-item woocommerce-product-attributes-item--dimensions">
<th class="woocommerce-product-attributes-item__label">Dimensions</th>
<td class="woocommerce-product-attributes-item__value">24 × 4 × 2 in</td>
</tr>
<tr class="woocommerce-product-attributes-item woocommerce-product-attributes-item--attribute_product_upc">
<th class="woocommerce-product-attributes-item__label">UPC</th>
<td class="woocommerce-product-attributes-item__value">605444972168</td>
</tr>
</tbody>
</table>
</div>
这是我的跑步:
C:\Users\Carre\scrape>python test.py
[WDM] - Current google-chrome version is 83.0.4103
[WDM] - Get LATEST driver version for 83.0.4103
[WDM] - Driver [C:\Users\Carre\.wdm\drivers\chromedriver\win32\83.0.4103.39\chromedriver.exe] found in cache
DevTools listening on ws://127.0.0.1:56807/devtools/browser/03318f43-1d26-44c7-8d90-65233969f03b
woocommerce-product-attributes-item__value
您的选择器可能已关闭。 尝试使用 Xpath。 右键单击标签,然后 select 复制 Xpath。 然后用这个替换你的代码。
upcode = browser.find_element_by_xpath('paste XPath here').text
我有你的解决方案,这是我在处理 selenium 上的不一致时常用的迂回方式:切换到beautifulsoup4
from selenium import webdriver
import bs4
from webdriver_manager.chrome import ChromeDriverManager
import time
import datetime
import csv
class toy(object):
browser = webdriver.Chrome(ChromeDriverManager().install())
browser.get('https://continuumgames.com/product/16-tracer-racer-set/')
time.sleep(2)
try:
test = browser.find_element_by_xpath('//*[@id="tab-additional_information"]/table/tbody/tr[3]/td').get_attribute('class')
except:
test = 'NA'
try:
upcode = browser.find_element_by_xpath('//*[@id="tab-additional_information"]/table/tbody/tr[3]/td')
upcode = bs4.BeautifulSoup(upcode.get_attribute('outerHTML'))
upcode = upcode.text
except:
upcode = 'NA'
print(test)
print(upcode)
browser.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.