简体   繁体   English

强制使用 JavaScript 远程加载 Instagram 个人资料页面的源代码

[英]Force Instagram profile page's source to load remotely with JavaScript

I'm creating a web based live total like count for Instagram users.我正在为 Instagram 用户创建一个基于网络的实时总数,例如计数。 Since Instagram does not offer getting the total amount of likes on an Instagram profile via their API, I'm scraping like counts off of the target users profile page by retrieving the html source code and extracting the data I need out of that.由于 Instagram 不提供通过他们的 API 获取 Instagram 个人资料上的总点赞数,我通过检索 html 源代码并从中提取我需要的数据来抓取目标用户个人资料页面的点赞数。 ( https://instagram.com/USERNAME ). https://instagram.com/USERNAME )。 This has all worked fine, however there are only 12 posts being loaded in the source since you have to scroll down for more posts to be loaded (you can see what I mean better by going to https://instagram.com/selenagomez and scrolling down. You'll see it loads quickly before displaying more posts).这一切正常,但是源中只有 12 个帖子被加载,因为您必须向下滚动才能加载更多帖子(您可以通过访问https://instagram.com/selenagomez和向下滚动。您会看到它在显示更多帖子之前快速加载)。 My goal is to be able to load all of the posts and then extract the data I need from that source file.我的目标是能够加载所有帖子,然后从该源文件中提取我需要的数据。

The amount of posts that are loaded is pretty unpredictable.加载的帖子数量非常难以预测。 It seems for verified users it loads 24 posts, while unverified it loads 12 which doesn't make much sense to me.对于经过验证的用户来说,它加载了 24 个帖子,而未经验证的用户则加载了 12 个,这对我来说没有多大意义。 I've looked around in Instagram's html source files but there doesn't seem to be any easy way to load additional posts without actually doing it yourself in a browser.我在 Instagram 的 html 源文件中环顾四周,但似乎没有任何简单的方法可以加载额外的帖子,而无需自己在浏览器中实际操作。 (but that won't work because I'm looking to accomplish this all remotely via code) (但这行不通,因为我希望通过代码远程完成这一切)

To load the source file I'm using the following code:要加载源文件,我使用以下代码:

var name = "selenagomez";
var url = "http://instagram.com/" + name;

    $.get(url, function(response) {
        ... regex ...
     }

In the source, Instagram has like counts attached to posts in the following form:在源代码中,Instagram 以以下形式附在帖子上的点赞数:

edge_liked_by':{'count':1234}

After the source is retrieved I'm using regex to get rid of everything but these edge_liked_by':{'count':1234}'s numbers.检索源后,我正在使用正则表达式去除除这些 edge_liked_by':{'count':1234} 的数字之外的所有内容。 Then the numbers are put into an array like the following:然后将数字放入如下数组中:

[1, 2, 3, 4, 5 etc, etc]

After that the array is added together to get the total number of likes and displayed on the web page.之后将数组相加得到总点赞数并显示在网页上。 All this code is working fine.所有这些代码都运行良好。

Ultimately I'm just looking to see how I can force the Instagram profile page to load all posts remotely so I can extract the like counts from the source.最终,我只是想看看如何强制 Instagram 个人资料页面远程加载所有帖子,以便我可以从源中提取点赞数。

Thank in advance for any help with this.提前感谢您对此的任何帮助。

I found another way of going about doing this by utilizing the END_CURSOR value provided by https://instagram.com/graphql/query for pagination.我找到了另一种方法,通过利用https://instagram.com/graphql/query提供的 END_CURSOR 值进行分页。

For anyone wondering the link for retrieving post's JSON is as follows: https://www.instagram.com/graphql/query/?query_hash=42323d64886122307be10013ad2dcc44&variables= {"id":"PROFILE ID","first":"INT","after":"END_CURSOR"}对于任何想知道检索帖子的 JSON 的链接如下的人: https://www.instagram.com/graphql/query/?query_hash=42323d64886122307be10013ad2dcc44&variables= {"id":"PROFILE ID","first":"INT", "之后":"END_CURSOR"}

Where PROFILE ID is the profile's numeric id which can be retrieved from another JSON link: https://www.instagram.com/ USERNAME ?__a=1其中PROFILE ID是个人资料的数字 ID,可以从另一个 JSON 链接中检索: https : //www.instagram.com/ USERNAME ?__a=1

and INT is the amount of posts JSON to fetch. INT是要获取的 JSON 帖子数量。 It can be anywhere between 1 and 50 per request.每个请求可以是 1 到 50 之间的任何值。

The trick to move past 50 is to add the provided END_CURSOR string in the next link, which will progress to the next page of posts where you can get another 50.超过 50 的技巧是在下一个链接中添加提供的 END_CURSOR 字符串,这将进入下一页的帖子,在那里您可以获得另外 50。

Notes:笔记:

  • You don't have to provide an END_CURSOR value in the link if you're only getting the most recent 1-50 posts from a user.如果您只是从用户那里获取最近的 1-50 个帖子,则不必在链接中提供 END_CURSOR 值。 The end cursor is really only useful if you're looking to fetch beyond the 50 most recent posts.结束光标只有在您希望获取超过 50 个最新帖子时才有用。

  • As of now the query_hash is static and can be left at 42323d64886122307be10013ad2dcc44截至目前,query_hash 是静态的,可以保留在42323d64886122307be10013ad2dcc44

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM