簡體   English   中英

找到漂亮的湯的CSS選擇器

[英]find css selector for beautiful soup

嗨,我想從下面給出的html中抓取parsel編號。 我正在嘗試使用美麗,但我什么也沒得到。 我嘗試了幾個選擇器,但沒有任何效果。 我可能會遺漏任何細節。 如果有人知道如何選擇特定的

import requests,bs4
from lxml import html
s = requests.Session()

r = s.get('http://69.160.37.111/assessor/taxweb/account.jsp?accountNum=R032229', cookies={'isLoggedInAsPublic': 'true'})
tree=html.fromstring(r.content)
res=bs4.BeautifulSoup(r.content,'lxml')
parsel=res.select('table.accountSummary table tbody')
print(parsel)

這是HTML:

<table class="accountSummary">
   <tbody>
      <tr valign="top">
         <th>
            <a href="account.jsp?accountNum=R032229&amp;doc=R032229.1519706542852">Location</a>
         </th>
         <th>
            <a href="account.jsp?accountNum=R032229&amp;doc=C00044008.1451631600000">Owner Information</a>
         </th>
         <th colspan="1">
            <a href="account.jsp?accountNum=R032229&amp;doc=AccountValue">Assessment History</a>
         </th>
      </tr>
      <tr valign="top">
         <td valign="top" width="40%">
            <!-- BEGIN What happens in the location text stays in the location text -->
            <table width="100%">
               <tbody>
                  <tr style="">
                     <td><strong>Parcel Number</strong> 71200000</td>
                  </tr>
                  <tr style="">
                     <td><strong>Tax Area</strong> 19A - TAX DISTRICT 19A</td>
                  </tr>
                  <tr style="">
                     <td><strong>Situs Address</strong> </td>
                  </tr>
                  <tr style="">
                     <td><strong>Legal Summary</strong> W.H.M.  SECTION A  BLK 1  LOT 1   CONT. 7.14 AC</td>
                  </tr>
               </tbody>
            </table>
            <!-- BEGIN What happens in the location text stays in the location text -->
         </td>
         <td valign="top" width="40%">
            <table>
               <tbody>
                  <tr>
                     <td><b>Owner Name</b> COLOTERRA DEVELOPMET LLC</td>
                  </tr>
                  <tr>
                     <td><b>Owner Address</b> 1711 TUNA CANYON RD <br>TOPANGA, CA 90290-3438</td>
                  </tr>
               </tbody>
            </table>
         </td>
         <td colspan="1" valign="top" width="40%">
            <table width="100%">
               <tbody>
                  <tr>
                     <td align="left"><b>Actual</b> (2017)</td>
                     <td align="right">$2,000</td>
                  </tr>
                  <tr>
                     <td align="left"><b>Primary Taxable</b></td>
                     <td align="right">$580</td>
                  </tr>
               </tbody>
            </table>
            <table width="100%">
               <caption><b>Tax Area:</b> 19A&nbsp;&nbsp;&nbsp; <b>Mill Levy</b>: 52.474000</caption>
               <tbody>
                  <tr>
                     <th align="left">Type</th>
                     <th align="right">Actual</th>
                     <th align="right">Assessed</th>
                     <th align="right">Units</th>
                  </tr>
                  <tr>
                     <td>Land</td>
                     <td align="right">$2,000</td>
                     <td align="right">$580</td>
                     <td align="right">1.000</td>
                  </tr>
               </tbody>
            </table>
            <br>
         </td>
      </tr>
      <tr valign="top">
         <th colspan="3">
            <a href="account.jsp?accountNum=R032229&amp;doc=TRN0098266">Transfers</a>
         </th>
      </tr>
      <tr valign="top">
         <td colspan="3" valign="top">
            <table width="100%">
               <tbody>
                  <tr>
                     <td align="center"><b>Reception Number</b></td>
                     <td align="center"><b>Book Page</b></td>
                     <td align="center"><b>Sale Date</b></td>
                     <td align="right"><b>Sale Price</b></td>
                     <td align="center"><b>Doc Description</b></td>
                  </tr>
                  <tr>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098266">256118</a></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098266">B: 398 P: 148</a></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098266">05/02/2007</a></td>
                     <td align="right"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098266">$0</a></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098266">QUIT CLAIM DEED</a></td>
                  </tr>
                  <tr>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098265">247573</a></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098265">B: 387 P: 376</a></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098265">01/16/2006</a></td>
                     <td align="right"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098265">$8,000</a></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098265">WARRANTY DEED</a></td>
                  </tr>
                  <tr>
                     <td></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098264">B: 307 P: 117</a></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098264">07/15/1994</a></td>
                     <td align="right"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098264">$0</a></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098264">WARRANTY DEED</a></td>
                  </tr>
                  <tr>
                     <td></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098263">B: 294 P: 308</a></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098263">12/10/1993</a></td>
                     <td align="right"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098263">$125</a></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098263">WARRANTY DEED</a></td>
                  </tr>
                  <tr>
                     <td></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098262">B: 254 P: 657</a></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098262">01/01/1800</a></td>
                     <td align="right"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098262">$0</a></td>
                     <td align="center"><a href="account.jsp?accountNum=R032229&amp;doc=TRN0098262">DEED</a></td>
                  </tr>
               </tbody>
            </table>
         </td>
      </tr>
      <tr valign="top">
         <th colspan="3">
            Images
         </th>
      </tr>
      <tr valign="top">
         <td colspan="3" valign="top">
            <div id="tab_control_12980">
               <ul id="tabs" class="tabs">
                  <li class="active"><a href="#tab_12980_0">GIS</a></li>
               </ul>
               <div id="tabcontentcontainer">
                  <div id="tab_12980_0" class="tab_page">
                     <div class="thumb">
                        <a href="account.jsp?accountNum=R032229&amp;doc=GIS&amp;page=1&amp;viewer=true"><img src="gisPicture.jsp?accountNum=R032229.1519706542852&amp;width=320&amp;height=320"></a>
                     </div>
                  </div>
               </div>
            </div>
            <script type="text/javascript">TabControl('tab_control_12980', { current: 'tab_12980_0' });</script>
         </td>
      </tr>
   </tbody>
</table>

我已經寫了硒代碼,但是處理這么多數據非常慢。 如果有人在這方面指導我,我會很高興

res.select('table.accountSummary table tbody td')[0]
res.select('table.accountSummary table tbody td')[4]
res.select('table.accountSummary table tbody td')[5]

輸出量

'Parcel Number 71200000'
'Owner Name COLOTERRA DEVELOPMET LLC'
'Owner Address 1711 TUNA CANYON RD TOPANGA, CA 90290-3438'

如果只想獲取數據,可以使用find方法

 res.select('table.accountSummary table tbody td')[0].find(text=True, recursive=False)
 res.select('table.accountSummary table tbody td')[4].find(text=True, recursive=False)
 res.select('table.accountSummary table tbody td')[5].find(text=True, recursive=False)

輸出量

' 71200000'
' COLOTERRA DEVELOPMET LLC'
' 1711 TUNA CANYON RD '

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM