简体   繁体   English

无法从网页中抓取一些附加到文本的时间戳

[英]Unable to scrape some timestamp attached to a text from a webpage

I'm trying to scrape a timestamp attached to a text from a webpage.我正在尝试从网页中抓取附加到文本的时间戳 I can grab the text flawlessly but can't find out the timestamp.我可以完美地抓取文本,但无法找到时间戳。 I can scrape the other timestamps attached to the comments from there, though.不过,我可以从那里抓取附加到评论的其他时间戳。 The timestamps which are with the comments can be found in the script tag as the value of created_at .带有注释的时间戳可以在脚本标签中作为created_at的值找到。 However, I can't find the one I'm after.但是,我找不到我要找的那个。

website url网址

I've tried with:我试过:

import re
import json
import requests

url = 'https://www.instagram.com/p/CEuX_8iH95S/'

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36'
    r = s.get(url)
    script_tag = json.loads(re.findall(r"window\._sharedData = (.*?});",r.text)[0])
    post_content = script_tag['entry_data']['PostPage'][0]['graphql']['shortcode_media']['edge_media_to_caption']['edges'][0]['node']['text']
    print(post_content)

How can I parse the timestamp attached to the text from the site above?如何解析附加到上述站点文本的时间戳?

You can parse the timestamp with .fromtimestamp() method from the datetime module.您可以使用datetime模块中的.fromtimestamp()方法解析时间戳。

Here's how to do it:这是如何做到的:

import datetime
import re
import json
import requests

url = 'https://www.instagram.com/p/CEuX_8iH95S/'

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36'
    r = s.get(url)
    script_tag = json.loads(re.findall(r'window\._sharedData = (.*?});', r.text)[0])
    post_date = script_tag['entry_data']['PostPage'][0]['graphql']['shortcode_media']['taken_at_timestamp']

    print(datetime.datetime.fromtimestamp(post_date).isoformat())
    print(datetime.datetime.fromtimestamp(post_date).strftime("%b %d %Y %H:%M:%S"))

This prints:这打印:

2020-09-04T20:25:49
Sep 04 2020 20:25:49

If you want to learn more about date formatting, check the docs here .如果您想了解有关日期格式的更多信息,请查看此处的文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM