I am trying to scrap the news website with news that are valid of a certain date. The output of the function return :
<li class="meta-data"><time data-datetime="relative" datetime="2022-01-30T08:56:09Z" title="2022-01-30T08:56:09Z">January 30, 2022 08:56</time></li>
How can I only print the Date time only? Printing i.text dont seem to work.
Below is the code.
import requests from bs4 import BeautifulSoup import datetime as datetime from datetime import timedelta import pandas as pd pd.set_option('display.max_columns',None) pd.set_option('max_colwidth',None) def okx_scrap(): b = [] url = 'https://www.okex.com/support/hc/en-us/sections/360000030652-Latest-Announcements' page = requests.get(url) soup = BeautifulSoup(page.content,'html.parser') small_soup = soup.find_all(class_ = "article-list-link") url_1st = 'https://www.okex.com/support' #Getting Yesterday's Date for i in small_soup: full_url = url_1st +(i['href']) page2 = requests.get(full_url) soup2 = BeautifulSoup(page2.content,'html.parser') small_soup2 = soup2.find_all('li', {'class': 'meta-data'}) #print(small_soup2) for i in small_soup2: print(i) okx_scrap()
Considering i<\/code> as a string (if not typecase the variable
i<\/code> to a string using built in method
i = str(i)<\/code> )
i = str(i)
i = i.split("><")[1]
i = i.split("datetime=")[2]
i = i.split("\"")[1]
print(i)
# 2022-01-30T08:56:09Z
you can use regex:
import re
string = '<li class="meta-data"><time data-datetime="relative" datetime="2022-01-30T08:56:09Z" title="2022-01-30T08:56:09Z">January 30, 2022 08:56</time></li>'
datetime= r"(\d{1,4}-\d{1,2}-\d{1,2}T\d{1,2}:\d{1,2}:\d{1,2}Z)"
output = re.findall(datetime, string)
#output:
['2022-01-30T08:56:09Z', '2022-01-30T08:56:09Z']
Don't use find_all<\/code> but
find<\/code> because there is only one entry in each page and extract
time<\/code> markup and not
li<\/code> :
def okx_scrap():
b = []
url = 'https://www.okex.com/support/hc/en-us/sections/360000030652-Latest-Announcements'
page = requests.get(url)
soup = BeautifulSoup(page.content,'html.parser')
small_soup = soup.find_all(class_ = "article-list-link")
url_1st = 'https://www.okex.com/support'
#Getting Yesterday's Date
for i in small_soup:
full_url = url_1st +(i['href'])
page2 = requests.get(full_url)
soup2 = BeautifulSoup(page2.content,'html.parser')
print(soup2.find('time')['datetime'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.