通過表/列表內的錨文本查找href

Question

我正在嘗試使用Python bs4從我之前成功登錄（使用請求）的網站中提取帶有特定錨文本的href。

這是目標網頁的偽HTML：

<table class="submissions">
   <thead>some thead</thead>
   <tbody><tr class="active">
           <th scope="row">uninterestingtext</th> 
           <td>uninterestingtext</td><td></td>
          </tr>
          <tr class="active">
           <th scope="row">uninteresting</th>   
           <td>uninteresting text</td><td></td></tr>
          <tr class="lastrow active"><th scope="row">uninteresting</th>
           <td>uninteresting text</td>
           <td></td>
          </tr>
          <tr class="lastrow inactive">
           <th scope="row">uninteresting text</th>
           <td>uninterestingtext
              <ul>
                <li><a href="uninteresting_href">someLink</a> </li>
                <li><a href="uninteresting_href">someLink</a> </li>
                <li><a href=**InterestingLink**>**Upload...**</a></li>
              </ul>
           </td>
          </tr></tbody></table>

現在，我正在嘗試通過查找'a'標簽之間的Upload ...文本來提取InterestingLink 。

這是我嘗試過的：

landing_page_soup = BeautifulSoup(*responseFromSuccessfulLogin*.text, 'html.parser') 
important_page = landing_page_soup.find('a',{'href':True,'text':'Upload...'}).get('href')

但這總是拋出錯誤

AttributeError: 'NoneType' object has no attribute 'get'

因為“ important_page”始終為“無”。

注意：我已經確保“ responseFromSuccessfulLogin .text”是正確的HTML，其中包含所需的鏈接。

在閱讀了有關類似問題的其他論壇主題之后，我修改了該行以使用方法“ select”查詢css-selector以及方法“ findAll”，但均未成功。

我覺得我搞砸了，因為它是一張桌子，鏈接在里面。

Answer 1

BeautifulSoup接受可調用對象。

html = BeautifulSoup(response.content, 'html.parser')
important_page = html.findAll('a', href=True, text=lambda i: i if 'Upload...' in i else False)

print(important_page[0]['href'])

Answer 2

（代表OP發布解決方案） 。

這個：

important_page = landing_page_soup.find('a', title='Upload...')['href']

非常適合我。 我只有我想要的鏈接。

通過表/列表內的錨文本查找href

問題描述

2 個解決方案

解決方案1
0 已采納 2016-10-31 07:03:35

解決方案2
0

通過表/列表內的錨文本查找href

問題描述

2 個解決方案

解決方案1 0 已采納 2016-10-31 07:03:35

解決方案2 0

解決方案1
0 已采納 2016-10-31 07:03:35

解決方案2
0