简体   繁体   English

熊猫read_csv解析国外日期

[英]pandas read_csv parse foreign dates

I am trying to use read_csv on a .csv file that contains a date column. 我正在尝试在包含日期列的.csv文件上使用read_csv The problem is that the date column is in a foreign language (romanian), with entries like: 问题在于日期列是外语(罗马尼亚语),其条目如下:

'26 septembrie 2017' '2017年9月26日'

'13 iulie 2017' '13 iulie 2017'

etc. How can I parse this nicely into a pandas dataframe which has a US date format? 等等。如何将其很好地解析为具有美国日期格式的熊猫数据框?

you can pass a converter for that column: 您可以为该列传递一个转换器:

df = pd.read_csv(myfile, converters={'date_column': foreign_date_converter})

But first you have to define the converter to do what you want. 但是首先,您必须定义转换器以执行所需的操作。 This approach uses locale manipulation: 此方法使用语言环境操作:

def foreign_date_converter(text):
    # Resets locale to "ro_RO" to parse romanian date properly
    # (non thread-safe code)
    loc = locale.getlocale(locale.LC_TIME)
    locale.setlocale(locale.LC_TIME, 'ro_RO')
    date = datetime.datetime.strptime(text '%d %b %Y').date()
    locale.setlocale(locale.LC_TIME, loc) # restores locale
    return date

Use dateparser module. 使用dateparser模块。

import dateparser
df = pd.read_csv('yourfile.csv', parse_dates=['date'], date_parser=dateparser.parse)

Enter your date column name in parse_dates parameter. parse_dates参数中输入日期列名称。 I'm just assuming it as date 我只是假设它是date

You may have output like this: 您可能具有以下输出:

      date
0   2017-09-26    
1   2017-07-13      

If you want to change the format use strftime strftime 如果要更改格式,请使用strftime strftime

df['date'] = df.date.dt.strftime(date_format = '%d %B %Y')

output: 输出:

      date
0   26 September 2017
1        13 July 2017

The easiest solution would be to simply use 12 times the str.replace(old, new) function. 最简单的解决方案是仅使用str.replace(old, new)函数的12倍。

It is not pretty but if you just built the function: 它不是很漂亮,但是如果您刚刚构建了函数:

def translater(date_string_with_exatly_one_date):
    date_str = date_string_with_exatly_one_date
    date_str = date_str.replace("iulie", "july")
    date_str = date_str.replace("septembrie", "september")
    #do this 10 more times with the right translation
    return date_str

Now you just have to call it for every entry. 现在,您只需为每个条目调用它。 After that you can handle it like a US date string. 之后,您可以像处理美国日期字符串一样处理它。 This is not very efficient but it will get the job done and you do not have to search for special libraries. 这不是很有效,但是可以完成工作,您不必搜索特殊的库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM