I am trying to do scraping from https://finansial.bisnis.com/read/20210506/90/1391096/laba-bank-mega-tumbuh-dua-digit-kuartal-i-2021-ini-penopangnya
. I am trying to scrape the date of news, here's my code:
news['tanggal'] = newsScrape['date']
dates = []
for x in news['tanggal']:
x = listToString(x)
x = x.strip()
x = x.replace('\r', '').replace('\n', '').replace(' \xa0|\xa0', ',').replace('|', ', ')
dates.append(x)
dates = listToString(dates)
dates = dates[0:20]
if len(dates) == 0:
continue
news['tanggal'] = dt.datetime.strptime(dates, '%d %B %Y, %H:%M')
but I got this error:
ValueError: time data '06 Mei 2021, 11:32 ' does not match format '%d %B %Y, %H:%M'
My assumption is because Mei
is in Indonesian language, meanwhile the format need May
which is in English. How to change Mei
to be May
? I have tried dates = dates.replace('Mei', 'May')
but it doesnt work on me. When I tried it, I got error ValueError: unconverted data remains:
The type of dates is string
. Thanks
Your assumption regarding the May -> Mei change is correct, the reason you're likely facing a problem after the replacement are the trailing spaces in your string, which are not accounted for in your format. You can use string.rstrip()
to remove these spaces.
import datetime as dt
dates = "06 Mei 2021, 11:32 "
dates = dates.replace("Mei", "May") # The replacement will have to be handled for all months, this is only an example
dates = dates.rstrip()
date = dt.datetime.strptime(dates, "%d %B %Y, %H:%M")
print(date) # 2021-05-06 11:32:00
While this does fix the problem here, it's messy to have to shorten the string like this after dates = dates[0:20]
. Consider using regex to gain the appropriate format at once.
The problem seems to be just the trailing white space you have, which explains the error ValueError: unconverted data remains:
. It is complaining that it is unable to convert the remaining data (whitespace).
s = '06 Mei 2021, 11:32 '.replace('Mei', 'May').strip()
datetime.strptime(s, '%d %B %Y, %H:%M')
# Returns datetime.datetime(2021, 5, 6, 11, 32)
Also, to convert all the Indonesian months to English, you can use a dictionary:
id_en_dict = {
...,
'Mei': 'May',
...
}
You can try with the following
import datetime as dt
import requests
from bs4 import BeautifulSoup
import urllib.request
url="https://finansial.bisnis.com/read/20210506/90/1391096/laba-bank-mega-tumbuh-dua-digit-kuartal-i-2021-ini-penopangnya"
r = requests.get(url, verify=False)
soup = BeautifulSoup(r.content, 'html.parser')
info_soup= soup.find(class_="new-description")
x=info_soup.find('span').get_text(strip=True)
x = x.strip()
x = x.replace('\r', '').replace('\n', '').replace(' \xa0|\xa0', ',').replace('|', ', ')
x = x[0:20]
x = x.rstrip()
date= dt.datetime.strptime(x.replace('Mei', 'May'), '%d %B %Y, %H:%M')
print(date)
result:
2021-05-06 11:45:00
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.