简体   繁体   English

Pandas DataFrame / HDFStore通过CSV传递多种日期格式

[英]Pandas DataFrame / HDFStore Pass Multiple Date formats through CSV

I am doing the following to pass multiple dates in different columns. 我正在执行以下操作以在不同的列中传递多个日期。 However, the second column ( time ) column doesn't conform to this string so it has errors. 但是,第二列(time)列与此字符串不符,因此有错误。 How do i achieve this? 我该如何实现?

 dateparse = lambda x: pd.datetime.strptime(x, '%d/%m/%Y %H:%M:%S')

 for chunk in pd.read_csv(file, chunksize=500000, parse_dates=['date','time'], parse_dates = dateparse, names = col_names, index_col = index_cols, header = 0, dtype = dtype)
        store.append('df',chunk)

Sample Data : 样本数据 :

 Date                     Time
19/10/2016 00:00:00      00:05:01

There is no need to specify datetime format if you have a standard format like '19/10/2016 00:00:00' - Pandas will parse it automatically, so you don't need to use date_parser parameter. 如果您使用标准格式,例如'19/10/2016 00:00:00'则无需指定日期时间格式-熊猫会自动对其进行解析,因此您无需使用date_parser参数。

Option 1: Converting Time column to datetime64[ns] dtype: 选项1:将“ Time列转换为datetime64[ns] dtype:

for chunk in pd.read_csv(file, chunksize=500000, parse_dates=['Date'], names=col_names, index_col=index_cols, dtype = dtype):
    chunk['Time'] = chunk['Date'].dt.normalize() + pd.to_timedelta(chunk['Time'])
    store.append('df',chunk)

Option 2: Converting Time column to timedelta64[ns] dtype: 选项2:将“ Time列转换为timedelta64[ns]

for chunk in pd.read_csv(file, chunksize=500000, parse_dates=['Date'], names=col_names, index_col=index_cols, dtype = dtype):
    chunk['Time'] = pd.to_timedelta(chunk['Time'])
    store.append('df',chunk)

PS both mentioned dtypes are supported by HDFStore PS两个提到的dtypes由HDFStore支持

Option 3: 选项3:

for chunk in pd.read_csv(file, chunksize=500000, names=col_names, index_col=index_cols, dtype = dtype):
    chunk['Date'] = pd.to_datetime(chunk['Date'], errors='coerce')
    chunk['Time'] = pd.to_timedelta(chunk['Time'], errors='coerce')
    store.append('df',chunk)

You can tell Pandas to combine the date and time columns into one column by passing a list of lists instead of just a list in parse_dates as specified in the documentation : 您可以通过传递列表列表,而不是仅按文档中指定的parse_dates的列表,告诉Pandas将日期和时间列组合为一列:

parse_dates : boolean or list of ints or names or list of lists or dict, default False parse_dates :布尔值或整数列表或名称列表或列表或字典,默认为False

  • boolean. 布尔值。 If True -> try parsing the index. 如果为True->尝试解析索引。
  • list of ints or names. 整数或名称列表。 eg If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column. 例如,如果[1、2、3]->尝试将第1、2、3列分别解析为单独的日期列。
  • list of lists. 列表列表。 eg If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column. 例如,如果[[[1,3]]->合并列1和3并解析为单个日期列。 dict, eg {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result 'foo' dict,例如{'foo':[1,3]}->将第1、3列解析为日期,并调用结果'foo'

You'll also want to specify dayfirst=True given your date format. 您还需要根据日期格式指定dayfirst=True

That means your code becomes 这意味着您的代码成为

for chunk in pd.read_csv(file, chunksize=500000, 
                         parse_dates=[['date', 'time']],  # note the extra []
                         dayfirst=True,
                         names=col_names, index_col=index_cols, 
                         header=0, dtype=dtype)
    store.append('df',chunk)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM