簡體   English   中英

從網站檢索數據時列表返回為空; 蟒蛇

[英]List comes back as empty when retrieveing data from website ; Python

我試圖通過將數據插入列表中來分析網站中的數據,但是列表又變空了。

url =("http://www.releasechimps.org/resources/publication/whos-there-md-  anderson")
           http = urllib3.PoolManager()
r = http.request('Get',url)
soup = BeautifulSoup(r.data,"html.parser")
#print(r.data)
loop = re.findall(r'<td>(.*?)</td>',str(r.data))
#print(str(loop))
newLoop = str(loop)
#print(newLoop)
for x in range(1229):
    if "\\n\\t\\t\\t\\t" in loop[x]:
        loop[x] = loop[x].replace("\\n\\t\\t\\t\\t","")
        list0_v2.append(str(loop[x]))
        print(loop[x])
print(str(list0_v2))

編輯:確實沒有其他事情在進行,所以我將您的數據格式變成了一個不錯的詞典列表。 猴子111上有一個奇怪的<td height="26"> ,所以我不得不稍微改變一下正則表達式。

希望這對您有所幫助,我做到了,因為我在乎猴子這個人。

import html
import re
import urllib.request

list0_v2 = []
final_list = []

url = "http://www.releasechimps.org/resources/publication/whos-there-md-anderson"
data = urllib.request.urlopen(url).read()
loop = re.findall(r'<td.*?>(.*?)</td>', str(data))

for item in loop:
    if "\\n\\t\\t\\t\\t" or "em>" in item:
        item = item.replace("\\n\\t\\t\\t\\t", "").replace("<em>", "")\
        .replace("</em>", "")
    if "&nbsp;" == item:
        continue
    list0_v2.append(item)

n = 1
while len(list0_v2) != 0:
    form = {"n":0, "name":"", "id":"", "gender":"", "birthdate":"", "notes":""}

    try:
        if list0_v2[5][-1] == '.':
            numb, name, ids, gender, birthdate, notes = list0_v2[0:6]
            form["notes"] = notes
            del(list0_v2[0:6])
        else:
            raise Exception('foo')
    except:
        numb, name, ids, gender, birthdate = list0_v2[0:5]
        del(list0_v2[0:5])

    form["n"] = int(numb)
    form["name"] = html.unescape(name)
    form["id"] = ids
    form["gender"] = gender
    form["birthdate"] = birthdate

    final_list.append(form)
    n += 1

for li in final_list:
    print("{:3} {:10} {:10} {:3} {:10} {}".format(li["n"], li["name"], li["id"],\
    li["gender"], li["birthdate"], li["notes"]))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM