How to scrape dates of News Site

Question

I am trying to scrap the news website with news that are valid of a certain date. The output of the function return :

<li class="meta-data"><time data-datetime="relative" datetime="2022-01-30T08:56:09Z" title="2022-01-30T08:56:09Z">January 30, 2022 08:56</time></li>

How can I only print the Date time only? Printing i.text dont seem to work.

Below is the code.

 import requests from bs4 import BeautifulSoup import datetime as datetime from datetime import timedelta import pandas as pd pd.set_option('display.max_columns',None) pd.set_option('max_colwidth',None) def okx_scrap(): b = [] url = 'https://www.okex.com/support/hc/en-us/sections/360000030652-Latest-Announcements' page = requests.get(url) soup = BeautifulSoup(page.content,'html.parser') small_soup = soup.find_all(class_ = "article-list-link") url_1st = 'https://www.okex.com/support' #Getting Yesterday's Date for i in small_soup: full_url = url_1st +(i['href']) page2 = requests.get(full_url) soup2 = BeautifulSoup(page2.content,'html.parser') small_soup2 = soup2.find_all('li', {'class': 'meta-data'}) #print(small_soup2) for i in small_soup2: print(i) okx_scrap()

Answer 1

Considering i<\/code> as a string (if not typecase the variable i<\/code> to a string using built in method i = str(i)<\/code> )

i = str(i)
i = i.split("><")[1]
i = i.split("datetime=")[2]
i = i.split("\"")[1]

print(i)
# 2022-01-30T08:56:09Z

Answer 2

you can use regex:

import re

string = '<li class="meta-data"><time data-datetime="relative" datetime="2022-01-30T08:56:09Z" title="2022-01-30T08:56:09Z">January 30, 2022 08:56</time></li>'

datetime= r"(\d{1,4}-\d{1,2}-\d{1,2}T\d{1,2}:\d{1,2}:\d{1,2}Z)"

output = re.findall(datetime, string)

#output:

['2022-01-30T08:56:09Z', '2022-01-30T08:56:09Z']

Answer 3

Don't use find_all<\/code> but find<\/code> because there is only one entry in each page and extract time<\/code> markup and not li<\/code> :

def okx_scrap():

    b = []
    url = 'https://www.okex.com/support/hc/en-us/sections/360000030652-Latest-Announcements'
    page = requests.get(url)
    soup = BeautifulSoup(page.content,'html.parser')
    small_soup = soup.find_all(class_ = "article-list-link")
    url_1st = 'https://www.okex.com/support'

        #Getting Yesterday's Date

    for i in small_soup:
        full_url = url_1st +(i['href'])
        page2 = requests.get(full_url)
        soup2 = BeautifulSoup(page2.content,'html.parser')
        print(soup2.find('time')['datetime'])

How to scrape dates of News Site

Question

3 answers

solution1
0 2022-02-06 07:03:34

solution2
0 2022-02-06 07:25:13

solution3
0 2022-02-06 07:26:01

How to scrape dates of News Site

Question

3 answers

solution1 0 2022-02-06 07:03:34

solution2 0 2022-02-06 07:25:13

solution3 0 2022-02-06 07:26:01

solution1
0 2022-02-06 07:03:34

solution2
0 2022-02-06 07:25:13

solution3
0 2022-02-06 07:26:01