简体   繁体   中英

Extract Tweets From Twitter Using Selenium

hello all I have a problem about extract tweets from twitter I write a script to go to one of the trending page on twitter and scroll down (N Times) and when scroll it extract tweet and that is work with me fine but after a number of scrolling down the page can't load new tweets and stop scrolling and no new tweets appear

when I set N=1000 for example he work fine but when he reach 600 or 400 scroll , the scroll stop and no tweets appear I will be very happy if any one can help me thanks a lot
my code is:

def scrap_tweets_without(url,no_scroll):
    drive = webdriver.Chrome(r'C:\selinum\chromedriver.exe')
    drive.get(url)
    ################################################## 
    ################## GET   SUCCES ##################
    ##################################################
    texts = []

    time.sleep(3)        

    # Start Scroll Tweets
    for i in tqdm.tqdm(range(no_scroll)):
        ## scroll down 
        SCROLL_PAUSE_TIME = 0.3

        # Get scroll height
        drive.execute_script("window.scrollBy(0,200)", "")


        # Wait to load page
        time.sleep(SCROLL_PAUSE_TIME)
        try:
            # Get Group of Tweets
            tweets = drive.find_elements_by_xpath('//div[@data-testid="tweetText" and @lang="ar"]')
        
            # Insert Tweet in the List 
            for tx in tweets:
                if tx.text not in texts:
                    texts.append(tx.text)
        except:
            pass
    return texts



url ='https://twitter.com/search?q="جمال علام"&src=trend_click&pt=1535911024460718080&vertical=trends'
data = scrap_tweets_without(url,1000)


this screen of selenuim browser after 600 scroll down the page can't scroll more than that and that give me around 450 tweets i believe that there is more tweets than 400 in one hashtag or in search page if any one can help why page can load more than that

在此处输入图像描述

after search in a lot of sources i found that my problem is that twitter know that i 'am a selunuim bot not user so stop loading more tweets when i scroll down so add this function and this help me

def initilaize_driver():
    options = webdriver.ChromeOptions()
    header = Headers().generate()['User-Agent']
    options.add_argument('--headless')  # runs browser in headless mode
    options.add_argument('--no-sandbox')
    options.add_argument("--disable-dev-shm-usage")
    options.add_argument('--ignore-certificate-errors')
    options.add_argument('--disable-gpu')
    options.add_argument('--log-level=3')
    options.add_argument('--disable-notifications')
    options.add_argument('--disable-popup-blocking')
    options.add_argument('--user-agent={}'.format(header))
   
    drive =webdriver.Chrome(executable_path=ChromeDriverManager().install(),
                        options= options, )
    drive = webdriver.Chrome(r'C:\selinum\chromedriver.exe')
    return drive

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM