在Python BeautifulSoup中提取具有href屬性的鏈接

Question

我有一個簡單的任務，可以從html（url）中提取鏈接。 我這樣做：

> #!/usr/bin/python
> 
> import urllib import webbrowser from bs4 import BeautifulSoup
> 
> URL = "http://54.75.225.110/quiz" URL_end = "/question"
> 
> LINK = URL + URL_end file =
> urllib.urlopen("http://54.75.225.110/quiz/question") soup =
> BeautifulSoup(file)
> 
> for item in soup.find_all(href=True):
>     print item
> 
> 
> print 'Hey there!'

這是HTML：

> <html><head><meta http-equiv="Content-Type" content="text/html;
> charset=ISO-8859-1"> <script
> src="./question_files/jquery.min.js"></script> <script
> type="text/javascript">
>        function n(s) {
>               var m = 0;
>               if (s.length == 0) return m;
>               for (i = 0; i < s.length; ++i) {
>                         o = s.charCodeAt(i);          m = ((m<<5)-m)+o;           m = m & m;
>               }
>         return m;
>        };
>        $(document).ready(function() {
>                document.cookie = "client_time=" + (+new Date());
>                $(".x").attr("href", "./answer/"+n($("p[id|='magic_number']").text()));
>        }); </script> </head> <body> <p> <a class="x" style="pointer-events: none;cursor: default;"
> href="http://54.75.225.110/quiz/answer/56595">this page</a> (be
> quick). </p>

為什么我的腳本返回的所有內容都是：“嘿！”？ 如果我將代碼修改為：

for item in soup.find_all('a'): print item

我得到的是：

> <a class="x" style="pointer-events: none;cursor: default;">this
> page</a>

為什么“ href”屬性在哪里？

Answer 1

我使用BeautifulSoup 4測試了您的HTML代碼：

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)

for a in soup.find_all('a'):
    if 'href' in a.attrs:
        print a['href']


http://54.75.225.110/quiz/answer/56595

Answer 2

您有一個拼寫錯誤：

for item in soup.find_all(herf=True):

應該是href：

for item in soup.find_all(href=True):

在Python BeautifulSoup中提取具有href屬性的鏈接

問題描述

2 個解決方案

解決方案1
1 已采納 2014-06-15 14:59:54

解決方案2
0

在Python BeautifulSoup中提取具有href屬性的鏈接

問題描述

2 個解決方案

解決方案1 1 已采納 2014-06-15 14:59:54

解決方案2 0

解決方案1
1 已采納 2014-06-15 14:59:54

解決方案2
0