简体   繁体   中英

Scraping certain number of posts in Instagram

I'm using the method the post link below to scraping instagram profiles. Can I change the number of images I retrieve? In the Json response I saw the 'has_next_page' parameter, but I'm not sure how to use it. Thanks in advance. Post link: What is the new instagram json endpoint?

Used code:

r = requests.get('https://www.instagram.com/' + profile + '/')
soup = BeautifulSoup(r.content)
scripts = soup.find_all('script', type="text/javascript", 
text=re.compile('window._sharedData'))
stringified_json = scripts[0].get_text().replace('window._sharedData = ', '')[:-1]
data = json.loads(stringified_json)['entry_data']['ProfilePage'][0]

您可以在以下位置找到Instagram API: https : //www.instagram.com/developer/我认为该文档非常简洁,您只需注册即可获取访问令牌。

Your problem is the following: In your code you scrap data from the profile page, which means you only get the images which have been loaded already. That's why you can't just set a larger number for it to get you more images.

I'd recommend one of the following:

1. Use Instagram's API, which comes with already built methods to do exactly what you seem to want to achieve ( don't reinvent the wheel ).

2. If instead you want to do most of the work yourself (let's say as an exercise) I'd recommend that you use Selenium, which is an automation. In your code you use BeautifulSoup which is great for retrieving data from HTML files, but you need to do something more: scroll - this is in order to allow for more pictures to be loaded. This way you can get as many pictures as you like.

In case you need an example, you can check out an example of something similar I wrote for Twitter here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM