简体   繁体   中英

Force Instagram profile page's source to load remotely with JavaScript

I'm creating a web based live total like count for Instagram users. Since Instagram does not offer getting the total amount of likes on an Instagram profile via their API, I'm scraping like counts off of the target users profile page by retrieving the html source code and extracting the data I need out of that. ( https://instagram.com/USERNAME ). This has all worked fine, however there are only 12 posts being loaded in the source since you have to scroll down for more posts to be loaded (you can see what I mean better by going to https://instagram.com/selenagomez and scrolling down. You'll see it loads quickly before displaying more posts). My goal is to be able to load all of the posts and then extract the data I need from that source file.

The amount of posts that are loaded is pretty unpredictable. It seems for verified users it loads 24 posts, while unverified it loads 12 which doesn't make much sense to me. I've looked around in Instagram's html source files but there doesn't seem to be any easy way to load additional posts without actually doing it yourself in a browser. (but that won't work because I'm looking to accomplish this all remotely via code)

To load the source file I'm using the following code:

var name = "selenagomez";
var url = "http://instagram.com/" + name;

    $.get(url, function(response) {
        ... regex ...
     }

In the source, Instagram has like counts attached to posts in the following form:

edge_liked_by':{'count':1234}

After the source is retrieved I'm using regex to get rid of everything but these edge_liked_by':{'count':1234}'s numbers. Then the numbers are put into an array like the following:

[1, 2, 3, 4, 5 etc, etc]

After that the array is added together to get the total number of likes and displayed on the web page. All this code is working fine.

Ultimately I'm just looking to see how I can force the Instagram profile page to load all posts remotely so I can extract the like counts from the source.

Thank in advance for any help with this.

I found another way of going about doing this by utilizing the END_CURSOR value provided by https://instagram.com/graphql/query for pagination.

For anyone wondering the link for retrieving post's JSON is as follows: https://www.instagram.com/graphql/query/?query_hash=42323d64886122307be10013ad2dcc44&variables= {"id":"PROFILE ID","first":"INT","after":"END_CURSOR"}

Where PROFILE ID is the profile's numeric id which can be retrieved from another JSON link: https://www.instagram.com/ USERNAME ?__a=1

and INT is the amount of posts JSON to fetch. It can be anywhere between 1 and 50 per request.

The trick to move past 50 is to add the provided END_CURSOR string in the next link, which will progress to the next page of posts where you can get another 50.

Notes:

  • You don't have to provide an END_CURSOR value in the link if you're only getting the most recent 1-50 posts from a user. The end cursor is really only useful if you're looking to fetch beyond the 50 most recent posts.

  • As of now the query_hash is static and can be left at 42323d64886122307be10013ad2dcc44

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM