简体   繁体   中英

Python datetime.strptime with Korean locale on Windows

I have a korean date string that looks like this:

월요일, 2019년 08월 05일 09:33:39

and I'm trying to parse it using datetime.strptime by setting the locale to kor (on Windows). The format is %A, %Y년 %m월 %d일 %H:%M:%S .

import locale
from datetime import datetime

locale.setlocale(locale.LC_TIME, 'kor')

date_string = '월요일, 2019년 08월 05일 09:33:39'
fromat = '%A, %Y년 %m월 %d일 %H:%M:%S'

time = datetime.strptime(date_string, format)

print(time)

This works fine with other languages (eg English, German, French) with a slightly different format string - obviously. However, the code above raises a ValueError :

ValueError: time data '월요일, 2019년 08월 05일 09:33:39' does not match format '%A, %Y년 %m월 %d일 %H:%M:%S'

I also tried to generate a date string with datetime.strftime :

import locale
from datetime import datetime

locale.setlocale(locale.LC_TIME, 'kor')

print(datetime.now().strftime('%A'))
# Prints '¿ù¿äÀÏ'

Whereas ¿ù¿äÀÏ does not match the weekday which would be 월요일 (Monday).

I've also tried decoding and encoding with UTF-8 or unicode-escape which all don't really work.

All the above code runs well on Mac/Linux using the ko_KR locale. However, ko_KR does not work on Windows either.

Does anyone have a clue of what is going on here? Somehow the locale and language support does not properly work with foreign characters.

Apply locale.setlocale(locale.LC_ALL, 'kor') instead of locale.setlocale(locale.LC_TIME, 'kor') as follows:

d:\bat> python -q
>>>
>>> import locale
>>> from datetime import datetime
>>>
>>> ### generate a date string with datetime.strftime
...
>>> locale.setlocale(locale.LC_ALL, 'kor')  ### crucial point ###
'Korean_Korea.949'
>>> locale.getlocale(locale.LC_TIME)
('Korean_Korea', '949')
>>> print(datetime.now().strftime('%A')) # Prints 월요일  (right!)
월요일
>>>
>>> ### parsing korean date string
...
>>> date_string = '월요일, 2019년 08월 05일 09:33:39'
>>> fromat = '%A, %Y년 %m월 %d일 %H:%M:%S'
>>>
>>> time = datetime.strptime(date_string, fromat)
>>> print(time)
2019-08-05 09:33:39
>>>

FYI, here are some other test cases ( Python 3.5.1 64 bit (AMD64) on win32 ):

import locale
from datetime import datetime

locale.getdefaultlocale()            # Echoes ('cs_CZ', 'cp65001')
locale.getlocale(locale.LC_TIME)     # Echoes (None, None) 
print(datetime.now().strftime('%A')) # Prints Monday            (wrong?)

# user’s default setting for LC_TIME category
locale.setlocale(locale.LC_TIME, '') # Echoes 'Czech_Czechia.utf8' 
locale.getlocale(locale.LC_TIME)     # Echoes ('Czech_Czechia', 'utf8')
print(datetime.now().strftime('%A')) # Prints pondÄí            (wrong!)

# user’s default setting for all categories
locale.setlocale(locale.LC_ALL, '')  # Echoes 'Czech_Czechia.utf8'
locale.getlocale(locale.LC_TIME)     # Echoes ('Czech_Czechia', 'utf8')
print(datetime.now().strftime('%A')) # Prints pondělí            (right!)

################################################

locale.setlocale(locale.LC_TIME, 'kor')
locale.getlocale(locale.LC_TIME)
print(datetime.now().strftime('%A')) # Prints ¿ù¿äÀÏ             (wrong!)

################################################

Maybe a solution would be:

  1. rename month to number of month

    u'january'.replace('january', 1)
  2. select by occurrence of substring per month

    ddrs={1:u'янв', 2:u'фев' , 3:u'мар' , 4:u'апр' , 5:u'ма' , 6:u'июн' , 7:u'июл',8:u'авг' , 9:u'сент' , 10:u'окт' , 11:u'ноя' , 12:u'дек'} numMonth = next((x for x in ddrs if 'январь'.find(ddrs[x])>-1), None)

The second idea is implemented as a class https://gist.github.com/mrbannyjo/f83b1a2ab302b0afee49d976de365aae

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM