簡體   English   中英

用Python解析來自實時網站的數據列舉問題!

[英]Parsing Data from live website in Python Enumerate problem!

以下腳本應該獲取特定的行號並從實時網站中進行解析。 它適用於30個循環,但是enumerate(f)似乎無法正常工作... for循環中的“ i”似乎停在第130行,而不是200行。 這可能是由於我要從中獲取數據的網站還是其他原因造成的? 謝謝!!

import sgmllib

class MyParser(sgmllib.SGMLParser):
"A simple parser class."

def parse(self, s):
    "Parse the given string 's'."
    self.feed(s)
    self.close()

def __init__(self, verbose=0):
    "Initialise an object, passing 'verbose' to the superclass."

    sgmllib.SGMLParser.__init__(self, verbose)
    self.divs = []
    self.descriptions = []
    self.inside_div_element = 0

def start_div(self, attributes):
    "Process a hyperlink and its 'attributes'."

    for name, value in attributes:
        if name == "id":
            self.divs.append(value)
            self.inside_div_element = 1

def end_div(self):
    "Record the end of a hyperlink."

    self.inside_div_element = 0

def handle_data(self, data):
    "Handle the textual 'data'."

    if self.inside_div_element:
        self.descriptions.append(data)


def get_div(self):
    "Return the list of hyperlinks."

    return self.divs

def get_descriptions(self, check):
    "Return a list of descriptions."
if check == 1:
    self.descriptions.pop(0)
    return self.descriptions

def rm_descriptions(self):
"Remove all descriptions."

self.descriptions.pop()

import urllib
import linecache
import sgmllib


tempLine = ""
tempStr = " "
tempStr2 = ""
myparser = MyParser()
count = 0
user = ['']
oldUser = ['none']  
oldoldUser = [' ']
array = [" ", 0]
index = 0
found = 0    
k = 0
j = 0
posIndex = 0
a = 0
firstCheck = 0
fCheck = 0
while a < 1000:

print a
f = urllib.urlopen("SITE")
a = a+1

for i, line in enumerate(f):


    if i == 187:
        print i
        tempLine = line
        print line

        myparser.parse(line)
        if fCheck == 1:
            result  = oldUser[0] is oldUser[1]

            u1 = oldUser[0]
            u2 = oldUser[1]
            tempStr = oldUser[1]
            if u1 == u2:
                result = 1
        else:
            result = user is oldUser
        fCheck = 1

        user = myparser.get_descriptions(firstCheck)
        tempStr = user[0]
        firstCheck = 1



        if result:

            array[index+1] = array[index+1] +0

        else:
            j = 0

            for z in array:
                k = j+2

                tempStr2 = user[0]
                if k < len(array) and tempStr2 == array[k]: 

                    array[j+3] = array[j+3] + 1
                    index = j+2
                    found = 1
                    break
                j = j+1
            if found == 0:

                array.append(tempStr)
                array.append(0)


        oldUser = user
        found = 0
        print array


    elif i > 200:
        print "HERE"
        break



print array
f.close()

也許該網頁上的行數比您想象的要少? 這給你什么?

print max(i for i, _ in enumerate(urllib.urlopen("SITE")))

撇開:縮進在while a < 1000:行之后填充。 過多的空行和一個字母的名稱不能幫助您理解代碼。

enumerate未破。 無需進行此類猜測,而是檢查您的數據。 意見建議:更換

for i, line in enumerate(f):

通過

lines = list(f)
print "=== a=%d linecount=%d === % (a, len(lines))
for i, line in enumerate(lines):
    print "   a=%d i=%d line=%r" % (a, i, line)

仔細檢查輸出。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM