如何在html標記內解析html並打印特定輸出

Question

#!/usr/bin/env python    
import requests, bs4

    res = requests.get('https://betaunityapi.webrootcloudav.com/Docs/APIDoc/APIReference')
    web_page = bs4.BeautifulSoup(res.text, "lxml")

    for d in web_page.findAll("div",{"class":"actionColumnText"}):
        print d

結果：

<div class="actionColumnText">
<a href="/Docs/APIDoc/Api/POST-api-console-gsm-gsmKey-sites-siteId-endpoints-reactivate">/service/api/console/gsm/{gsmKey}/sites/{siteId}/endpoints/reactivate</a>
</div>
<div class="actionColumnText">
Reactivates a list of endpoints, or all endpoints on a site.        </div>

我有興趣看到輸出只有最后一行（ 重新激活端點列表，或站點上的所有端點 ）刪除開始和結束。 對href的行不感興趣

任何幫助是極大的贊賞。

Answer 1

在一個簡單的例子中，您可以獲得文本：

for d in web_page.find_all("div", {"class": "actionColumnText"}):
    print(d.get_text())

或者，如果您只想找到單個元素，則可以按索引獲取最后一個匹配項：

d = web_page.find_all("div", {"class": "actionColumnText"})[-1]
print(d.get_text())

或者，你也可以找到div與特定類的元素不具備的a子元素：

def filter_divs(elm):
    return elm and elm.name == "div" and "actionColumnText" in elm.attrs and elm.a is None 

for d in web_page.find_all(fitler_divs):
    print(d.get_text())

或者，如果是單個元素：

web_page.find(fitler_divs).get_text()

Answer 2

你可以用CSS選擇器選擇最后一個：

var d = web_page.select("div.actionColmnText:last")
d.string()

Answer 3

如果此文本更改，您可以使用

#!/usr/bin/env python    
import requests, bs4

    res = requests.get('https://betaunityapi.webrootcloudav.com/Docs/APIDoc/APIReference')
    web_page = bs4.BeautifulSoup(res.text, "lxml")

    yourText = web_page.findAll("div",{"class":"actionColumnText"})[-1]
    yourText = yourText.split('  ')[0]

如何在html標記內解析html並打印特定輸出

問題描述

3 個解決方案

解決方案1
0 已采納 2016-04-19 15:15:37

解決方案2
0 2016-04-19 15:17:28

解決方案3
0 2016-04-19 15:19:03

如何在html標記內解析html並打印特定輸出

問題描述

3 個解決方案

解決方案1 0 已采納 2016-04-19 15:15:37

解決方案2 0 2016-04-19 15:17:28

解決方案3 0 2016-04-19 15:19:03

解決方案1
0 已采納 2016-04-19 15:15:37

解決方案2
0 2016-04-19 15:17:28

解決方案3
0 2016-04-19 15:19:03