简体   繁体   中英

Python Instagram Web scraper troubles

I am try to build a web scraper to that tells me the number of times a hashtag is used on Instagram but I keep getting either a error code on different iterations or "None" for current the response. Here is my code and the html.

Python

import requests
from bs4 import BeautifulSoup
url = 'https://www.instagram.com/explore/tags/savethekids/'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
tag = soup.find("span", {"class": "g47SY "})
print(tag)

Thats the code I made

HTML

<span class="-nal3 ">
  <span class="g47SY ">22,922</span> 
   " posts"
</span>

That is the HTML from Instagram

If anyone who actually knows what they are doing could point out what i'm doing wrong and how to fix it that would be great.

Try this,

import requests

url = 'https://www.instagram.com/explore/tags/savethekids/?__a=1'

response = requests.get(url)

count = response.json().get('graphql', {}).get('hashtag', {}).get('edge_hashtag_to_media', {}).get('count')

print(count)

Output:

22924

See it in action here

The issue when using requests is that the html is not rendered yet. Try following tutorial on scraping instagram.

This uses a tool called selenium to get the actual html from instagram.

The following code should get the element you are looking for when you have the selnium webdriver working.

from selenium.webdriver import Chrome
browser = Chrome()
url = 'https://www.instagram.com/explore/tags/savethekids/'
browser.get(url)
print(browser.find_element_by_class_name('g47SY'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM