Pandas DataFrame / HDFStore通过CSV传递多种日期格式

Question

I am doing the following to pass multiple dates in different columns. 我正在执行以下操作以在不同的列中传递多个日期。 However, the second column ( time ) column doesn't conform to this string so it has errors. 但是，第二列（time）列与此字符串不符，因此有错误。 How do i achieve this? 我该如何实现？

 dateparse = lambda x: pd.datetime.strptime(x, '%d/%m/%Y %H:%M:%S')

 for chunk in pd.read_csv(file, chunksize=500000, parse_dates=['date','time'], parse_dates = dateparse, names = col_names, index_col = index_cols, header = 0, dtype = dtype)
        store.append('df',chunk)

Sample Data : 样本数据：

 Date                     Time
19/10/2016 00:00:00      00:05:01

Answer 1

There is no need to specify datetime format if you have a standard format like '19/10/2016 00:00:00' - Pandas will parse it automatically, so you don't need to use date_parser parameter. 如果您使用标准格式，例如'19/10/2016 00:00:00'则无需指定日期时间格式-熊猫会自动对其进行解析，因此您无需使用date_parser参数。

Option 1: Converting Time column to datetime64[ns] dtype: 选项1：将“ Time列转换为datetime64[ns] dtype：

for chunk in pd.read_csv(file, chunksize=500000, parse_dates=['Date'], names=col_names, index_col=index_cols, dtype = dtype):
    chunk['Time'] = chunk['Date'].dt.normalize() + pd.to_timedelta(chunk['Time'])
    store.append('df',chunk)

Option 2: Converting Time column to timedelta64[ns] dtype: 选项2：将“ Time列转换为timedelta64[ns] ：

for chunk in pd.read_csv(file, chunksize=500000, parse_dates=['Date'], names=col_names, index_col=index_cols, dtype = dtype):
    chunk['Time'] = pd.to_timedelta(chunk['Time'])
    store.append('df',chunk)

PS both mentioned dtypes are supported by HDFStore PS两个提到的dtypes由HDFStore支持

Option 3: 选项3：

for chunk in pd.read_csv(file, chunksize=500000, names=col_names, index_col=index_cols, dtype = dtype):
    chunk['Date'] = pd.to_datetime(chunk['Date'], errors='coerce')
    chunk['Time'] = pd.to_timedelta(chunk['Time'], errors='coerce')
    store.append('df',chunk)

Answer 2

You can tell Pandas to combine the date and time columns into one column by passing a list of lists instead of just a list in parse_dates as specified in the documentation : 您可以通过传递列表列表，而不是仅按文档中指定的parse_dates的列表，告诉Pandas将日期和时间列组合为一列：

parse_dates : boolean or list of ints or names or list of lists or dict, default False parse_dates ：布尔值或整数列表或名称列表或列表或字典，默认为False

boolean. 布尔值。 If True -> try parsing the index. 如果为True->尝试解析索引。

list of ints or names. 整数或名称列表。 eg If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column. 例如，如果[1、2、3]->尝试将第1、2、3列分别解析为单独的日期列。

list of lists. 列表列表。 eg If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column. 例如，如果[[[1，3]]->合并列1和3并解析为单个日期列。 dict, eg {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result 'foo' dict，例如{'foo'：[1，3]}->将第1、3列解析为日期，并调用结果'foo'

You'll also want to specify dayfirst=True given your date format. 您还需要根据日期格式指定dayfirst=True 。

That means your code becomes 这意味着您的代码成为

for chunk in pd.read_csv(file, chunksize=500000, 
                         parse_dates=[['date', 'time']],  # note the extra []
                         dayfirst=True,
                         names=col_names, index_col=index_cols, 
                         header=0, dtype=dtype)
    store.append('df',chunk)

Pandas DataFrame / HDFStore通过CSV传递多种日期格式

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-01-10 15:37:41

解决方案2
1 2017-01-10 16:28:49

Pandas DataFrame / HDFStore通过CSV传递多种日期格式

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-01-10 15:37:41

解决方案2 1 2017-01-10 16:28:49

解决方案1
2 已采纳 2017-01-10 15:37:41

解决方案2
1 2017-01-10 16:28:49