簡體   English   中英

從Python BeautifulSoup獲取HREF的初學者問題涉及列表理解

[英]Beginner issues grabbing HREF from Python BeautifulSoup involving list comprehension

我已經用代碼縮小了HTML的范圍,但是使用列表推導來獲取href地址時遇到了麻煩。

這是我的代碼(BASE_URL和STEM_URL是固定地址):

soup = BeautifulSoup(requests.get(BASE_URL).text)
divyclass = soup.find("div", {"class":"node-inner"}).tbody

我遇到問題並收到此錯誤的地方( TypeError: 'NoneType' object has no attribute '__getitem__' )是在為清單理解添加此行時:

links = [STEM_URL + row.a["href"] for row in divyclass.findAll("td")]

當我跑步

for row in divyclass.findAll("td"):
    print row

,我得到的輸出是順便說一句,以便您可以看到我從何處提取a hrefs:

<td align="center" class="tableheader" colspan="4" valign="middle">NBA Drafts</td>
<td align="center" class="text" valign="middle"> </td>
<td align="center" class="text" valign="middle"> </td>
<td align="center" class="text" valign="middle"> </td>
<td align="center" class="text" valign="middle"><a href="/nba_final_draft/2014">2014</a></td>
<td align="center" class="text" valign="middle"> <a href="/nba_final_draft/2013">2013</a></td>
<td align="center" class="text" valign="middle"> <a href="/nba_final_draft/2012">2012</a></td>
<td align="center" class="text" valign="middle"><a href="/nba_final_draft/2011">2011</a></td>
<td align="center" class="text" valign="middle"><a href="/nba_final_draft/2010">2010</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_final_draft/2009">2009</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2008.html">2008</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2007.html">2007</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2006.html">2006</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2005.html">2005</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2004.html">2004</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2003.html">2003</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2002.html">2002</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2001.html">2001</a></td>

ah! 我只是想拉hrefs! 在此先感謝大家!

對於沒有行a在它們的元素, row.a將是None 如果你改變

links = [STEM_URL + row.a["href"] for row in divyclass.findAll("td")]

links = [STEM_URL + row.a["href"] for row in divyclass.findAll("td") if row.a]

將過濾出row沒有元素a元素。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM