Instagram web 刮与 selenium Python 问题

Question

I have a problem with scraping all pictures from Instagram profile, I'm scrolling the page till bottom then find all "a" tags finally always I get only last 30 links to pictures.我从 Instagram 个人资料中抓取所有图片时遇到问题，我将页面滚动到底部，然后找到所有“a”标签，最后总是我只得到最后 30 个图片链接。 I think that driver doesn't see full content of page.我认为该驱动程序看不到页面的全部内容。

#scroll
scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
match=False
while(match==False):
    last_count = scrolldown
    time.sleep(2)
    scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
    if last_count==scrolldown:
        match=True

#posts
posts = []
time.sleep(2)
links = driver.find_elements_by_tag_name('a')
time.sleep(2)
for link in links:
    post = link.get_attribute('href')
    if '/p/' in post:
        posts.append(post)

Answer 1

Looks like you first scrolling to the page bottom and only then getting the links instead of getting the links and treating them inside the scrolling loop.看起来您首先滚动到页面底部，然后才获取链接，而不是获取链接并在滚动循环中处理它们。
So, if you want to get all the links you should perform the因此，如果您想获取所有链接，您应该执行

links = driver.find_elements_by_tag_name('a')
time.sleep(2)
for link in links:
    post = link.get_attribute('href')
    if '/p/' in post:
        posts.append(post)

inside the scrolling, also before the first scrolling.在滚动内部，也在第一次滚动之前。
Something like this:像这样的东西：

def get_links():
    time.sleep(2)
    links = driver.find_elements_by_tag_name('a')
    time.sleep(2)
    for link in links:
        post = link.get_attribute('href')
        if '/p/' in post:
            posts.add(post)

posts = set()
get_links()
scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
match=False
while(match==False):
    get_links()
    last_count = scrolldown
    time.sleep(2)
    scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
    if last_count==scrolldown:
        match=True

Instagram web 刮与 selenium Python 问题

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-05-28 08:25:09

Instagram web 刮与 selenium Python 问题

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-05-28 08:25:09

解决方案1
0 已采纳 2021-05-28 08:25:09