簡體   English   中英

打印特定行(Beautifulsoup)

[英]Print specific line (Beautifulsoup)

目前,我的代碼正在通過鏈接解析並打印網站上的所有信息。 我只想從網站上打印一行 我該怎么做呢?

這是我的代碼:

from bs4 import BeautifulSoup
import urllib.request

r = urllib.request.urlopen("Link goes here").read()
soup = BeautifulSoup(r, "html.parser")

# This is what I want to change. I currently have it printing everything.
# I just want a specific line from the website

print (soup.prettify())

不要使用漂亮的打印來嘗試解析tds,請專門選擇標簽,如果屬性是唯一的,則使用該標簽,如果類名是唯一的,則僅使用該標簽:

td = soup.select_one("td.content")
td = soup.select_one("td[colspan=3]")

如果是第四個TD:

td = soup.select_one("td:nth-of-type(4)")

如果它在特定的表中,則選擇該表,然后在表中找到td,嘗試將html拆分為幾行,而建立索引實際上比使用正則表達式解析html更糟糕。

您可以使用td前面的粗體標簽中的文本來獲取特定的td,即金融部大樓分類 ::

In [19]: from bs4 import BeautifulSoup

In [20]: import urllib.request

In [21]: url = "http://a810-bisweb.nyc.gov/bisweb/PropertyProfileOverviewServlet?boro=1&houseno=1&street=park+ave&go2=+GO+&requestid=0"

In [22]: r = urllib.request.urlopen(url).read()

In [23]: soup = BeautifulSoup(r, "html.parser")

In [24]: print(soup.find("b",text="Department of Finance Building Classification:").find_next("td").text)
O6-OFFICE BUILDINGS

選擇第n個表格和行:

In [25]: print(soup.select_one("table:nth-of-type(8) tr:nth-of-type(5) td[colspan=3]").text)
O6-OFFICE BUILDINGS
li = soup.prettify().split('\n')
print str(li[line_number-1])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM