簡體   English   中英

刪除特定的項目Python Beautiful Soup

[英]Remove specific items Python Beautiful Soup

我該如何將其去除,並使用python和漂亮的湯把剩下的剩下, td其他項目需要保留

<td style="background:#aaccff" width="50"></td>
<td align="left" style="background:#aaccff" width="150">Device Type</td>
<td align="left" style="background:#aaccff" width="115">IP Address</td>
<td align="left" style="background:#aaccff" width="100">Device Name</td>
<td align="left" style="background:#aaccff" width="215">Notes</td>
<td width="50"></td>

這是完整的代碼

<td style="background:#aaccff" width="50"></td>
<td align="left" style="background:#aaccff" width="150">Device Type</td>
<td align="left" style="background:#aaccff" width="115">IP Address</td>
<td align="left" style="background:#aaccff" width="100">Device Name</td>
<td align="left" style="background:#aaccff" width="215">Notes</td>
<td width="50"></td>
<td align="left" width="150">AudioCodes Gateway</td>
<td align="left" width="115">172.31.31.2</td>
<td align="left" width="100"></td>
<td align="left" width="215">FXO</td>
<td style="background:#aaccff" width="50"></td>
<td align="left" style="background:#aaccff" width="150">Device Type</td>
<td align="left" style="background:#aaccff" width="115">IP Address</td>
<td align="left" style="background:#aaccff" width="100">Device Name</td>
<td align="left" style="background:#aaccff" width="215">Notes</td>
<td width="50"></td>
<td align="left" width="150">IC Server</td>
<td align="left" width="115">172.31.56.151</td>
<td align="left" width="100">IND056GIC151</td>
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.151</td>
<td width="50"></td>
<td align="left" width="150">IC Server</td>
<td align="left" width="115">172.31.56.152</td>
<td align="left" width="100">IND056GIC152</td>
<td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.152</td>
<td width="50"></td>
<td align="left" width="150">Media Server</td>
<td align="left" width="115">IND1106HMS07</td>
<td align="left" width="100">IND1106HMS07</td>
<td align="left" width="215"></td>
<td width="50"></td>
<td align="left" width="150">Media Server</td>
<td align="left" width="115">IND1106HMS07</td>
<td align="left" width="100">IND1106HMS07</td>
<td align="left" width="215"></td>

這是到目前為止我有代碼明智

from ntlm import HTTPNtlmAuthHandler
from bs4 import BeautifulSoup
import requests, os, bleach, urllib2, cookielib

os.system('clear')
user = 'user'
password = "pass"
url = "url"

cookies = cookielib.CookieJar()
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookies),HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman))

pagedata=opener.open(url)
soup=BeautifulSoup(pagedata)

def myfunction(b):
table = b.find('ul', {'class': 'dfwp-column dfwp-list'})

for a in table.findAll('a'):
    [a.decompose() for a in table("a")]
for tr in table.findAll('tr'):
    for td in tr.findAll('td'):

        print td

myfunction(soup)

這是當前的輸出

設備類型IP地址設備名稱注釋

AudioCodes網關172.31.31.2

FXO

設備類型IP地址設備名稱注釋

IC服務器172.31.56.151 IND056GIC151 NAT'd IP =等待MPLS,語音IP = 172.31.52.151

IC服務器172.31.56.152 IND056GIC152 NAT'd IP =等待MPLS,語音IP = 172.31.52.152

媒體服務器IND1106HMS07 IND1106HMS07

媒體服務器IND1106HMS07 IND1106HMS07

通常,當人們問如何使用bs4 “刪除”某些東西時,他們實際上只是在問如何不在find操作中包括它。

您要排除多余的空格(即帶有tag.text == ''標簽)和這四個“列標題”標簽。 您可以通過CSS選擇器來完成后者,但是前者需要明確過濾。 因此,一次執行這兩項操作最簡單,而且在我看來更具聲明性:

soup = BeautifulSoup(that_long_html_you_gave)

blacklist = {'Device Type','IP Address','Device Name','Notes'}

table = soup.body # to match your variable name.  I think.

table.find_all(lambda tag: tag.text and tag.text not in blacklist)
Out[45]: 
[<td align="left" width="150">AudioCodes Gateway</td>,
 <td align="left" width="115">172.31.31.2</td>,
 <td align="left" width="215">FXO</td>,
 <td align="left" width="150">IC Server</td>,
 <td align="left" width="115">172.31.56.151</td>,
 <td align="left" width="100">IND056GIC151</td>,
 <td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.151</td>,
 <td align="left" width="150">IC Server</td>,
 <td align="left" width="115">172.31.56.152</td>,
 <td align="left" width="100">IND056GIC152</td>,
 <td align="left" width="215">NAT'd IP = PENDING MPLS, Voice IP = 172.31.52.152</td>,
 <td align="left" width="150">Media Server</td>,
 <td align="left" width="115">IND1106HMS07</td>,
 <td align="left" width="100">IND1106HMS07</td>,
 <td align="left" width="150">Media Server</td>,
 <td align="left" width="115">IND1106HMS07</td>,
 <td align="left" width="100">IND1106HMS07</td>]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM