Python Instagram 网络爬虫问题

Question

I am try to build a web scraper to that tells me the number of times a hashtag is used on Instagram but I keep getting either a error code on different iterations or "None" for current the response.我正在尝试构建一个网络爬虫，它告诉我在 Instagram 上使用主题标签的次数，但我不断收到不同迭代的错误代码或当前响应的“无”。 Here is my code and the html.这是我的代码和 html。

Python Python

import requests
from bs4 import BeautifulSoup
url = 'https://www.instagram.com/explore/tags/savethekids/'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
tag = soup.find("span", {"class": "g47SY "})
print(tag)

Thats the code I made那是我做的代码

HTML HTML

<span class="-nal3 ">
  <span class="g47SY ">22,922</span> 
   " posts"
</span>

That is the HTML from Instagram那是来自 Instagram 的 HTML

If anyone who actually knows what they are doing could point out what i'm doing wrong and how to fix it that would be great.如果任何真正知道他们在做什么的人都可以指出我做错了什么以及如何解决它，那就太好了。

Answer 1

Try this,尝试这个，

import requests

url = 'https://www.instagram.com/explore/tags/savethekids/?__a=1'

response = requests.get(url)

count = response.json().get('graphql', {}).get('hashtag', {}).get('edge_hashtag_to_media', {}).get('count')

print(count)

Output:输出：

See it in action here在这里看到它的行动

Answer 2

The issue when using requests is that the html is not rendered yet.使用请求时的问题是尚未呈现 html。 Try following tutorial on scraping instagram.尝试遵循有关抓取 Instagram 的教程。

This uses a tool called selenium to get the actual html from instagram.这使用称为 selenium 的工具从 instagram 获取实际的 html。

The following code should get the element you are looking for when you have the selnium webdriver working.当您使用 selnium webdriver 时，以下代码应该会获取您正在寻找的元素。

from selenium.webdriver import Chrome
browser = Chrome()
url = 'https://www.instagram.com/explore/tags/savethekids/'
browser.get(url)
print(browser.find_element_by_class_name('g47SY'))

Python Instagram 网络爬虫问题

问题描述

2 个解决方案

解决方案1
1 2020-03-03 14:50:57

解决方案2
-1 2020-03-03 03:36:46

Python Instagram 网络爬虫问题

问题描述

2 个解决方案

解决方案1 1 2020-03-03 14:50:57

解决方案2 -1 2020-03-03 03:36:46

解决方案1
1 2020-03-03 14:50:57

解决方案2
-1 2020-03-03 03:36:46