I have a korean date string that looks like this:
월요일, 2019년 08월 05일 09:33:39
and I'm trying to parse it using datetime.strptime
by setting the locale to kor
(on Windows). The format is %A, %Y년 %m월 %d일 %H:%M:%S
.
import locale
from datetime import datetime
locale.setlocale(locale.LC_TIME, 'kor')
date_string = '월요일, 2019년 08월 05일 09:33:39'
fromat = '%A, %Y년 %m월 %d일 %H:%M:%S'
time = datetime.strptime(date_string, format)
print(time)
This works fine with other languages (eg English, German, French) with a slightly different format string - obviously. However, the code above raises a ValueError
:
ValueError: time data '월요일, 2019년 08월 05일 09:33:39' does not match format '%A, %Y년 %m월 %d일 %H:%M:%S'
I also tried to generate a date string with datetime.strftime
:
import locale
from datetime import datetime
locale.setlocale(locale.LC_TIME, 'kor')
print(datetime.now().strftime('%A'))
# Prints '¿ù¿äÀÏ'
Whereas ¿ù¿äÀÏ
does not match the weekday which would be 월요일
(Monday).
I've also tried decoding and encoding with UTF-8
or unicode-escape
which all don't really work.
All the above code runs well on Mac/Linux using the ko_KR
locale. However, ko_KR
does not work on Windows either.
Does anyone have a clue of what is going on here? Somehow the locale and language support does not properly work with foreign characters.
Apply locale.setlocale(locale.LC_ALL, 'kor')
instead of locale.setlocale(locale.LC_TIME, 'kor')
as follows:
d:\bat> python -q
>>>
>>> import locale
>>> from datetime import datetime
>>>
>>> ### generate a date string with datetime.strftime
...
>>> locale.setlocale(locale.LC_ALL, 'kor') ### crucial point ###
'Korean_Korea.949'
>>> locale.getlocale(locale.LC_TIME)
('Korean_Korea', '949')
>>> print(datetime.now().strftime('%A')) # Prints 월요일 (right!)
월요일
>>>
>>> ### parsing korean date string
...
>>> date_string = '월요일, 2019년 08월 05일 09:33:39'
>>> fromat = '%A, %Y년 %m월 %d일 %H:%M:%S'
>>>
>>> time = datetime.strptime(date_string, fromat)
>>> print(time)
2019-08-05 09:33:39
>>>
FYI, here are some other test cases ( Python 3.5.1 64 bit (AMD64) on win32 ):
import locale
from datetime import datetime
locale.getdefaultlocale() # Echoes ('cs_CZ', 'cp65001')
locale.getlocale(locale.LC_TIME) # Echoes (None, None)
print(datetime.now().strftime('%A')) # Prints Monday (wrong?)
# user’s default setting for LC_TIME category
locale.setlocale(locale.LC_TIME, '') # Echoes 'Czech_Czechia.utf8'
locale.getlocale(locale.LC_TIME) # Echoes ('Czech_Czechia', 'utf8')
print(datetime.now().strftime('%A')) # Prints pondÄÃ (wrong!)
# user’s default setting for all categories
locale.setlocale(locale.LC_ALL, '') # Echoes 'Czech_Czechia.utf8'
locale.getlocale(locale.LC_TIME) # Echoes ('Czech_Czechia', 'utf8')
print(datetime.now().strftime('%A')) # Prints pondělí (right!)
################################################
locale.setlocale(locale.LC_TIME, 'kor')
locale.getlocale(locale.LC_TIME)
print(datetime.now().strftime('%A')) # Prints ¿ù¿äÀÏ (wrong!)
################################################
Maybe a solution would be:
rename month to number of month
u'january'.replace('january', 1)
select by occurrence of substring per month
ddrs={1:u'янв', 2:u'фев' , 3:u'мар' , 4:u'апр' , 5:u'ма' , 6:u'июн' , 7:u'июл',8:u'авг' , 9:u'сент' , 10:u'окт' , 11:u'ноя' , 12:u'дек'} numMonth = next((x for x in ddrs if 'январь'.find(ddrs[x])>-1), None)
The second idea is implemented as a class https://gist.github.com/mrbannyjo/f83b1a2ab302b0afee49d976de365aae
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.