简体   繁体   English

如何获取 Instagram 用户最新帖子的 URL? (使用 Python)

[英]How to get the URLs of the most recent posts of a Instagram user? (with Python)

I want to get the URLs of the most recent posts of an Instagram user (not me, and I don't have an IG account so I can't use the API).我想获取 Instagram 用户(不是我,而且我没有 IG 帐户,因此无法使用 API)的最新帖子的 URL。 The URLs should be in the style of https://www.instagram.com/p/BpnlsmWgqon/ URL 的样式应为https://www.instagram.com/p/BpnlsmWgqon/

I've tried making a request with response = requests.get(profile_url) and then parsing the HTML with soup = BeautifulSoup(html, 'html.parser') .我尝试使用response = requests.get(profile_url)发出response = requests.get(profile_url) ,然后使用soup = BeautifulSoup(html, 'html.parser')解析 HTML。

After these and some other functions I get a big JSON file with data of the most recent pics (but not their URLs).在这些和其他一些函数之后,我得到了一个包含最新图片数据的大 JSON 文件(但不是它们的 URL)。

How can I get the URLs and extract just that?如何获取 URL 并仅提取该 URL?

Edit: This is what I've coded now.编辑:这就是我现在编码的内容。 It's a mess, I've trying many approaches but none has worked.一团糟,我尝试了很多方法,但都没有奏效。

#from subprocess import call
#from instagram.client import InstagramAPI
import requests
import json
from bs4 import BeautifulSoup
#from InstagramAPI.InstagramAPI import InstagramAPI
from instagram.client import InstagramAPI
from config import login, password
userid = "6194091573"
#url = "https://www.instagram.com/mercadona.novedades/?__a=1"
#pic_url =
#call('instalooter user mercadona.novedades ./pics -n 2')
#r = requests.get("https://www.instagram.com/mercadona.novedades")
#print(r.text)
def request_pic_url(profile_url):
    response = requests.get(profile_url)
    return response.text

def extract_json(html):
    soup = BeautifulSoup(html, 'html.parser')
    body = soup.find('body')
    script_tag = body.find('script')
    raw_string = script_tag.text.strip().replace('window._sharedData =', '').replace(';', '')
    return json.loads(raw_string)

def get_recent_pics(profile_url):
    results = []
    response = request_pic_url(profile_url)
    json_data = extract_json(response)
    metrics = json_data['entry_data']['ProfilePage'][0]['graphql']['user']['edge_owner_to_timeline_media']["edges"]
    for node in metrics:
        node = node.get('node')
        if node and isinstance(node, dict):
            results.append(node)
    return results

def api_thing():
    api = InstagramAPI(login, password)
    recent_media, next_ = api.user_recent_media(userid, 2)
    for media in recent_media:
        print(media.caption.text)

def main():
    userid = "6194091573"
    api_thing()

if __name__ == "__main__":
    main()

def get_large_pic(url):
    return url + "/media/?size=l"

def get_media_id(url):
    req = requests.get('https://api.instagram.com/oembed/?url={}'.format(url))
    media_id = req.json()['media_id']
    return media_id

i suggest you to use the following library: https://github.com/LevPasha/Instagram-API-python我建议您使用以下库: https : //github.com/LevPasha/Instagram-API-python

api = InstagramAPI("username", "password")
api.login()

def get_lastposts(us_id):
    api.getUserFeed(us_id)
    if 'items' in api.LastJson:
        info = api.LastJson['items']
        posts=[]
        for media in info:
            if (media['caption']!=None):
               #print(media['caption']['media_id'])
               posts.append(media['caption']['media_id'])
        return posts
get_lastposts('user_id')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM