[英]Remove specific items Python Beautiful Soup
我該如何將其去除,並使用python和漂亮的湯把剩下的剩下, td
其他項目需要保留
<td style="background:#aaccff" width="50"></td>
<td align="left" style="background:#aaccff" width="150">Device Type</td>
<td align="left" style="background:#aaccff" width="115">IP Address</td>
<td align="left" style="background:#aaccff" width="100">Device Name</td>
<td align="left" style="background:#aaccff" width="215">Notes</td>
<td width="50"></td>
這是完整的代碼
<td style="background:#aaccff" width="50"></td>
<td align="left" style="background:#aaccff" width="150">Device Type</td>
<td align="left" style="background:#aaccff" width="115">IP Address</td>
<td align="left" style="background:#aaccff" width="100">Device Name</td>
<td align="left" style="background:#aaccff" width="215">Notes</td>
<td width="50"></td>
<td align="left" width="150">AudioCodes Gateway</td>
<td align="left" width="115">172.31.31.2</td>
<td align="left" width="100"></td>
<td align="left" width="215">FXO</td>
<td style="background:#aaccff" width="50"></td>
<td align="left" style="background:#aaccff" width="150">Device Type</td>
<td align="left" style="background:#aaccff" width="115">IP Address</td>
<td align="left" style="background:#aaccff" width="100">Device Name</td>
<td align="left" style="background:#aaccff" width="215">Notes</td>
<td width="50"></td>
<td align="left" width="150">IC Server</td>
<td align="left" width="115">172.31.56.151</td>
<td align="left" width="100">IND056GIC151</td>
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.151</td>
<td width="50"></td>
<td align="left" width="150">IC Server</td>
<td align="left" width="115">172.31.56.152</td>
<td align="left" width="100">IND056GIC152</td>
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.152</td>
<td width="50"></td>
<td align="left" width="150">Media Server</td>
<td align="left" width="115">IND1106HMS07</td>
<td align="left" width="100">IND1106HMS07</td>
<td align="left" width="215"></td>
<td width="50"></td>
<td align="left" width="150">Media Server</td>
<td align="left" width="115">IND1106HMS07</td>
<td align="left" width="100">IND1106HMS07</td>
<td align="left" width="215"></td>
這是到目前為止我有代碼明智
from ntlm import HTTPNtlmAuthHandler
from bs4 import BeautifulSoup
import requests, os, bleach, urllib2, cookielib
os.system('clear')
user = 'user'
password = "pass"
url = "url"
cookies = cookielib.CookieJar()
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookies),HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman))
pagedata=opener.open(url)
soup=BeautifulSoup(pagedata)
def myfunction(b):
table = b.find('ul', {'class': 'dfwp-column dfwp-list'})
for a in table.findAll('a'):
[a.decompose() for a in table("a")]
for tr in table.findAll('tr'):
for td in tr.findAll('td'):
print td
myfunction(soup)
這是當前的輸出
設備類型IP地址設備名稱注釋
AudioCodes網關172.31.31.2
FXO
設備類型IP地址設備名稱注釋
IC服務器172.31.56.151 IND056GIC151 NAT'd IP =等待MPLS,語音IP = 172.31.52.151
IC服務器172.31.56.152 IND056GIC152 NAT'd IP =等待MPLS,語音IP = 172.31.52.152
媒體服務器IND1106HMS07 IND1106HMS07
媒體服務器IND1106HMS07 IND1106HMS07
通常,當人們問如何使用bs4
“刪除”某些東西時,他們實際上只是在問如何不在find
操作中包括它。
您要排除多余的空格(即帶有tag.text == ''
標簽)和這四個“列標題”標簽。 您可以通過CSS選擇器來完成后者,但是前者需要明確過濾。 因此,一次執行這兩項操作最簡單,而且在我看來更具聲明性:
soup = BeautifulSoup(that_long_html_you_gave)
blacklist = {'Device Type','IP Address','Device Name','Notes'}
table = soup.body # to match your variable name. I think.
table.find_all(lambda tag: tag.text and tag.text not in blacklist)
Out[45]:
[<td align="left" width="150">AudioCodes Gateway</td>,
<td align="left" width="115">172.31.31.2</td>,
<td align="left" width="215">FXO</td>,
<td align="left" width="150">IC Server</td>,
<td align="left" width="115">172.31.56.151</td>,
<td align="left" width="100">IND056GIC151</td>,
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.151</td>,
<td align="left" width="150">IC Server</td>,
<td align="left" width="115">172.31.56.152</td>,
<td align="left" width="100">IND056GIC152</td>,
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.152</td>,
<td align="left" width="150">Media Server</td>,
<td align="left" width="115">IND1106HMS07</td>,
<td align="left" width="100">IND1106HMS07</td>,
<td align="left" width="150">Media Server</td>,
<td align="left" width="115">IND1106HMS07</td>,
<td align="left" width="100">IND1106HMS07</td>]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.