[英]Print specific line (Beautifulsoup)
目前,我的代碼正在通過鏈接解析並打印網站上的所有信息。 我只想從網站上打印一行 。 我該怎么做呢?
這是我的代碼:
from bs4 import BeautifulSoup
import urllib.request
r = urllib.request.urlopen("Link goes here").read()
soup = BeautifulSoup(r, "html.parser")
# This is what I want to change. I currently have it printing everything.
# I just want a specific line from the website
print (soup.prettify())
不要使用漂亮的打印來嘗試解析tds,請專門選擇標簽,如果屬性是唯一的,則使用該標簽,如果類名是唯一的,則僅使用該標簽:
td = soup.select_one("td.content")
td = soup.select_one("td[colspan=3]")
如果是第四個TD:
td = soup.select_one("td:nth-of-type(4)")
如果它在特定的表中,則選擇該表,然后在表中找到td,嘗試將html拆分為幾行,而建立索引實際上比使用正則表達式解析html更糟糕。
您可以使用td前面的粗體標簽中的文本來獲取特定的td,即金融部大樓分類 ::
In [19]: from bs4 import BeautifulSoup
In [20]: import urllib.request
In [21]: url = "http://a810-bisweb.nyc.gov/bisweb/PropertyProfileOverviewServlet?boro=1&houseno=1&street=park+ave&go2=+GO+&requestid=0"
In [22]: r = urllib.request.urlopen(url).read()
In [23]: soup = BeautifulSoup(r, "html.parser")
In [24]: print(soup.find("b",text="Department of Finance Building Classification:").find_next("td").text)
O6-OFFICE BUILDINGS
選擇第n個表格和行:
In [25]: print(soup.select_one("table:nth-of-type(8) tr:nth-of-type(5) td[colspan=3]").text)
O6-OFFICE BUILDINGS
li = soup.prettify().split('\n')
print str(li[line_number-1])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.