為什么我的Python腳本不能正確返回頁面源？

Question

我剛剛編寫了一個腳本，該腳本旨在遍歷字母並查找所有無人認領的四字母Twitter名稱（確實是為了練習，因為我是Python新手。） 我之前寫過一些腳本，這些腳本使用'urllib2'從url獲取網站html，但這一次它似乎無法正常工作。 這是我的腳本：

import urllib2

src=''
url=''
print "finding four-letter @usernames on twitter..."
d_one=''
d_two=''
d_three=''
d_four=''
n_one=0
n_two=0
n_three=0
n_four=0
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

while (n_one > 26):
    while(n_two > 26):
        while (n_three > 26):
            while (n_four > 26):
                d_one=letters[n_one]
                d_two=letters[n_two]
                d_three=letters[n_three]
                d_four=letters[n_four]
                url = "twitter.com/" + d_one + d_two + d_three + d_four

                src=urllib2.urlopen(url)
                src=src.read()
                if (src.find('Sorry, that page doesn’t exist!') >= 0):
                    print "nope"
                    n_four+=1
                else:
                    print url
                    n_four+=1
            n_three+=1
            n_four=0
        n_two+=1
        n_three=0
        n_four=0
    n_one+=1    
    n_two=0
    n_three=0
    n_four=0

運行此代碼返回以下錯誤：

語法錯誤：第29行的文件name.py中的非ASCII字符'\\ xe2'，但未聲明編碼； 有關詳細信息，請參見http://www.python.org/peps/pep-0263.html

訪問該鏈接並進行一些其他搜索之后，我在文檔頂部添加了以下行：

# coding: utf-8

現在，盡管它不再返回錯誤，但似乎什么也沒有發生。 我加了線

print src

應該已經打印了每個URL的html，但是當我運行它時什么也沒發生。 任何建議將不勝感激。

Answer 1

您可以使用itertools.product消除過多的嵌套

from itertools import product
for d_one, d_two, d_three, d_four in product(letters, repeat=4):
    ...

除了定義字母列表之外，您還可以使用strings.ascii_lowercase

您應該告訴urlopen您正在使用哪個協議（http）

url = "http://twitter.com/" + d_one + d_two + d_three + d_four

此外，當你得到了一個不存在的頁面時，提出的urlopen一個404 ，那么你應該檢查那個，而不是看網頁文本

Answer 2

好，您初始化n_one=0 ，然后while (n_one > 26)執行循環。 當Python第一次遇到它時，它會看到while (0 > 26) ，這顯然是錯誤的，因此跳過了整個循環。

正如小刺手的回答告訴你的那樣，總有一些更清潔的方法來進行循環。

為什么我的Python腳本不能正確返回頁面源？

問題描述

2 個解決方案

解決方案1
5 2012-08-13 04:13:10

解決方案2
1 已采納 2012-08-13 04:16:05

為什么我的Python腳本不能正確返回頁面源？

問題描述

2 個解決方案

解決方案1 5 2012-08-13 04:13:10

解決方案2 1 已采納 2012-08-13 04:16:05

解決方案1
5 2012-08-13 04:13:10

解決方案2
1 已采納 2012-08-13 04:16:05