簡體   English   中英

HTML表格特定的行搜尋

[英]HTML Table Specific Row Scraping

我想從該表的特定行中抓取數據。 我只想要橙色/金色行。 以前,我使用SIM提供的以下代碼來抓取整個表格的信息,然后再進行操作:

from selenium.webdriver import Chrome
from contextlib import closing
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup

URL = "https://www.n2yo.com/passes/?s=39090&a=1"

chrome_options = Options()  
chrome_options.add_argument("--headless")

with closing(Chrome(chrome_options=chrome_options)) as driver:
    driver.get(URL)
    soup = BeautifulSoup(driver.page_source, 'lxml')
    for items in soup.select("#passestable tr"):
        data = [item.text for item in items.select("th,td")]
        print(data)

我不確定如何更改此代碼以僅獲取橙色/金色行。 解析時,我嘗試搜索顏色代碼作為標簽,但是沒有用。 任何和所有建議表示贊賞。

感謝您的時間。

您可以使用正則表達式來匹配顏色:

from selenium import webdriver
from bs4 import BeautifulSoup as soup
import re
d = driver.Chrome()
d.get("https://www.n2yo.com/passes/?s=39090&a=1")
s = soup(d.page_source, 'lxml')
data = [i.text for i in s.find_all('tr', {'bgcolor':re.compile('#FFFFFF|#FFFF33|#FFCC00')})]

輸出:

[u'16-Mar 20:34N12\xb020:42W265\xb079\xb020:48SSW199\xb0-Map and details', u'17-Mar 07:51S178\xb007:58W260\xb052\xb008:05NNW341\xb0-Map and details', u'17-Mar 20:00NNE19\xb020:08E102\xb050\xb020:14S180\xb0-Map and details', u'18-Mar 07:17SSE160\xb007:24E83\xb077\xb007:31N349\xb0-Map and details', u'18-Mar 08:58SW217\xb009:04W269\xb013\xb009:09NW323\xb0-Map and details', u'18-Mar 21:06N6\xb021:13WNW295\xb041\xb021:19SW217\xb0-Map and details', u'19-Mar 06:43SE142\xb006:50ENE67\xb038\xb006:57N356\xb0-Map and details', u'19-Mar 08:23SSW196\xb008:30W268\xb027\xb008:36NNW333\xb0-Map and details', u'19-Mar 20:32N12\xb020:39WNW286\xb084\xb020:46SSW198\xb0-Map and details', u'20-Mar 07:48S177\xb007:55WSW254\xb055\xb008:02NNW342\xb0-Map and details', u'20-Mar 19:58NNE20\xb020:05E98\xb047\xb020:12S178\xb0-Map and details', u'21-Mar 07:14SSE159\xb007:22NE58\xb072\xb007:28N349\xb0-Map and details', u'21-Mar 08:55SW216\xb009:01W272\xb014\xb009:07NW325\xb0-Map and details', u'21-Mar 21:03N6\xb021:10WNW288\xb043\xb021:17SW215\xb0-Map and details', u'22-Mar 06:41SE141\xb006:48ENE70\xb036\xb006:54N356\xb0-Map and details', u'22-Mar 08:20S194\xb008:27W265\xb029\xb008:34NNW335\xb0-Map and details', u'22-Mar 20:29N13\xb020:36N348\xb086\xb020:43SSW196\xb0-Map and details', u'23-Mar 07:46S176\xb007:53W265\xb059\xb008:00NNW343\xb0-Map and details', u'23-Mar 19:55NNE20\xb020:02E94\xb045\xb020:09S177\xb0-Map and details', u'24-Mar 07:12SSE157\xb007:19ENE71\xb069\xb007:26N350\xb0-Map and details', u'24-Mar 08:53SW214\xb008:59W270\xb015\xb009:04NW325\xb0-Map and details', u'24-Mar 21:01N7\xb021:08WNW292\xb046\xb021:14SW214\xb0-Map and details', u'25-Mar 06:38SE139\xb006:45ENE65\xb034\xb006:52N357\xb0-Map and details', u'25-Mar 08:18S193\xb008:24W263\xb030\xb008:31NNW335\xb0-Map and details', u'25-Mar 18:49NE39\xb018:54E87\xb010\xb018:59SE134\xb0-Map and details', u'25-Mar 20:27N13\xb020:34SSE161\xb086\xb020:41S195\xb0-Map and details']

嘗試替換這條線

for items in soup.select("#passestable tr"):

與這個

for items in soup.select("#passestable tr[bgcolor='#FFCC00'], #passestable tr[bgcolor='#FFFF33']"):

遍歷僅需要顏色的tr節點

請注意,這將返回所有橙色節點,然后返回所有黃金節點

您可以嘗試的另一種不使用selenium

from lxml.html import fromstring
import requests

r = requests.get(URL)
html = fromstring((r.content).decode('utf-8'))
# only orange and yellow rows
rows = html.xpath('//tr[@bgcolor="#FFFF33" or @bgcolor="#FFCC00"]')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM