Python美丽汤

Question

I am learning Beautiful Soup for Python and trying to parse a website " https://www.twitteraudit.com/ ". 我正在学习Python的Beautiful Soup，并尝试解析一个网站“ https://www.twitteraudit.com/ ”。 When I enter a twitter id in the search bar, it returns the results for some id in a fraction of seconds, but some id takes about a minute to process the data. 当我在搜索栏中输入Twitter ID时，它会在几秒钟内返回某些ID的结果，但某些ID大约需要一分钟来处理数据。 In this case, how can I parse the HTML after it gets loaded or the result is done? 在这种情况下，如何在加载或结果完成后解析HTML？ And I tried to loop it, but it doesn't work that way. 我试图循环它，但是那样行不通。 But what I figured was if I open a browser and load the web link and once its done it is storing the cache in the computer and the next time when I run for the same id it works perfectly. 但是我想到的是，如果我打开浏览器并加载Web链接，一旦完成，它将缓存存储在计算机中，而下一次当我运行相同的ID时，它运行得很好。

Can anyone help me out with this? 谁能帮我这个忙吗？ I appreciate the help. 感谢您的帮助。 I attach the code below>> 我在下面附上代码>>

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import re
from re import sub

def HTML(myURL):
    uClient = uReq(myURL)
    pageHTML = uClient.read()
    uClient.close()

    pageSoup = soup(pageHTML, "html.parser")
    return pageSoup

def fakecheck(usr):
    myURLfc = "https://www.twitteraudit.com/" + usr
    pgSoup = HTML(myURLfc)

    foll = pgSoup.findAll("div",{"class":"audit"})


    link = foll[0].div.a["href"]
    real = foll[0].findAll("span",{"class":"real number"})[0]["data-value"]
    fake = foll[0].findAll("span",{"class":"fake number"})[0]["data-value"]
    scr = foll[0].findAll("div",{"class":"score"})[0].div
    scoresent = scr["class"][1]
    score = re.findall(r'\d{1,3}',str(scr))[0]
    return [link, real, fake, scoresent, score]


lis = ["BarackObama","POTUS44","ObamaWhiteHouse","MichelleObama","ObamaFoundation","NSC44","ObamaNews","WhiteHouseCEQ44","IsThatBarrak","obama_barrak","theprezident","barrakubama","BarrakObama","banackkobama","YusssufferObama","barrakisdabomb_","BarrakObmma","fuzzyjellymasta","BarrakObama6","bannalover101","therealbarrak","ObamaBarrak666","barrak_obama"]

for u in lis:
    link, real, fake, scoresent, score = fakecheck(u)

    print ("link : " + link)
    print ("Real : " + real)
    print ("Fake : " + fake)
    print ("Result : " + scoresent)
    print ("Score : " + score)
    print ("=================")

Answer 1

I think the problem is some of the Twitter ID's have not yet been audited, and so I was getting an IndexError . 我认为问题是某些Twitter ID尚未经过审核，因此我收到了IndexError 。 However, putting the call to fakecheck(u) in a while True: loop that catches that error will continually check the website until an audit has been performed on that ID. 但是， fakecheck(u)的调用fakecheck(u) while True:循环中以捕获该错误的时间将不断检查网站，直到对该ID进行审核为止。

I put this code after the lis definition: 我将此代码放在lis定义之后：

def get_fake_check(n):
    return fakecheck(n)

for u in lis:
    while True:
        try:
            link, real, fake, scoresent, score = get_fake_check(u)
            break
        except:
            pass

I'm not sure if there is a way to automate the audit request on the website, but when a query is waiting, I manually clicked the " Audit " button on the website for that ID, and once the audit was completed, the script continued as usual until all ID audits were processed. 我不确定是否可以在网站上自动执行审核请求，但是在查询等待时，我手动单击了网站上该ID的“ Audit ”按钮，一旦审核完成，该脚本像往常一样继续进行，直到完成所有ID审核为止。

Python美丽汤

问题描述

1 个解决方案

解决方案1
0 2017-04-19 00:18:28

Python美丽汤

问题描述

1 个解决方案

解决方案1 0 2017-04-19 00:18:28

解决方案1
0 2017-04-19 00:18:28