[英]Beginner issues grabbing HREF from Python BeautifulSoup involving list comprehension
我已經用代碼縮小了HTML的范圍,但是使用列表推導來獲取href地址時遇到了麻煩。
這是我的代碼(BASE_URL和STEM_URL是固定地址):
soup = BeautifulSoup(requests.get(BASE_URL).text)
divyclass = soup.find("div", {"class":"node-inner"}).tbody
我遇到問題並收到此錯誤的地方( TypeError: 'NoneType' object has no attribute '__getitem__'
)是在為清單理解添加此行時:
links = [STEM_URL + row.a["href"] for row in divyclass.findAll("td")]
當我跑步
for row in divyclass.findAll("td"):
print row
,我得到的輸出是順便說一句,以便您可以看到我從何處提取a hrefs:
<td align="center" class="tableheader" colspan="4" valign="middle">NBA Drafts</td>
<td align="center" class="text" valign="middle"> </td>
<td align="center" class="text" valign="middle"> </td>
<td align="center" class="text" valign="middle"> </td>
<td align="center" class="text" valign="middle"><a href="/nba_final_draft/2014">2014</a></td>
<td align="center" class="text" valign="middle"> <a href="/nba_final_draft/2013">2013</a></td>
<td align="center" class="text" valign="middle"> <a href="/nba_final_draft/2012">2012</a></td>
<td align="center" class="text" valign="middle"><a href="/nba_final_draft/2011">2011</a></td>
<td align="center" class="text" valign="middle"><a href="/nba_final_draft/2010">2010</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_final_draft/2009">2009</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2008.html">2008</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2007.html">2007</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2006.html">2006</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2005.html">2005</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2004.html">2004</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2003.html">2003</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2002.html">2002</a></td>
<td align="center" class="text" valign="middle" width="25%"><a href="/nba_draft_history/2001.html">2001</a></td>
ah! 我只是想拉hrefs! 在此先感謝大家!
對於沒有行a
在它們的元素, row.a
將是None
。 如果你改變
links = [STEM_URL + row.a["href"] for row in divyclass.findAll("td")]
至
links = [STEM_URL + row.a["href"] for row in divyclass.findAll("td") if row.a]
將過濾出row
沒有元素a
元素。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.