Instagram web 刮与 selenium Python 问题

Question

我从 Instagram 个人资料中抓取所有图片时遇到问题，我将页面滚动到底部，然后找到所有“a”标签，最后总是我只得到最后 30 个图片链接。 我认为该驱动程序看不到页面的全部内容。

#scroll
scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
match=False
while(match==False):
    last_count = scrolldown
    time.sleep(2)
    scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
    if last_count==scrolldown:
        match=True

#posts
posts = []
time.sleep(2)
links = driver.find_elements_by_tag_name('a')
time.sleep(2)
for link in links:
    post = link.get_attribute('href')
    if '/p/' in post:
        posts.append(post)

Answer 1

看起来您首先滚动到页面底部，然后才获取链接，而不是获取链接并在滚动循环中处理它们。
因此，如果您想获取所有链接，您应该执行

links = driver.find_elements_by_tag_name('a')
time.sleep(2)
for link in links:
    post = link.get_attribute('href')
    if '/p/' in post:
        posts.append(post)

在滚动内部，也在第一次滚动之前。
像这样的东西：

def get_links():
    time.sleep(2)
    links = driver.find_elements_by_tag_name('a')
    time.sleep(2)
    for link in links:
        post = link.get_attribute('href')
        if '/p/' in post:
            posts.add(post)

posts = set()
get_links()
scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
match=False
while(match==False):
    get_links()
    last_count = scrolldown
    time.sleep(2)
    scrolldown = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var scrolldown=document.body.scrollHeight;return scrolldown;")
    if last_count==scrolldown:
        match=True

Instagram web 刮与 selenium Python 问题

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-05-28 08:25:09

Instagram web 刮与 selenium Python 问题

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-05-28 08:25:09

解决方案1
0 已采纳 2021-05-28 08:25:09