简体繁体 English

Bigquery 中的错误日期格式从字符串更改为日期

[英]Bad date format change from string to date in Bigquery

原文 2021-07-12 16:44:21 3 2 sql/ google-bigquery

Been struggling with some datasets I want to use which have a problem with the date format.一直在努力处理我想使用的一些日期格式有问题的数据集。 Bigquery could not load the files and returned the following error: Bigquery 无法加载文件并返回以下错误：

Could not parse '4/12/2016 2:47:30 AM' as TIMESTAMP for field date (position 1) starting at location 21 with message 'Invalid time zone: AM'无法将“2016 年 4 月 12 日凌晨 2:47:30”解析为字段日期（位置 1）的 TIMESTAMP，从位置 21 开始，并显示消息“无效时区：上午”

I have been able to upload the file manually but as strings, and now would like to set the fields back to the proper format, However, I just could not find a way to change the format of the date column from string to proper DateTime format.我已经能够手动上传文件但作为字符串，现在想将字段设置回正确的格式，但是，我找不到将日期列的格式从字符串更改为正确的 DateTime 格式的方法.

Would love to know if this is possible as the file is just too long to be formatted in excel or sheets (as I have done with the smaller files from this dataset).很想知道这是否可能，因为文件太长而无法以 excel 或表格格式化（就像我对这个数据集中的较小文件所做的那样）。

2 个解决方案

now would like to set the fields back to the proper format ... from string to proper DateTime format现在想将字段设置回正确的格式......从字符串到正确的日期时间格式

Use parse_datetime('%m/%d/%Y %r', string_col) to parse datetime out of string使用parse_datetime('%m/%d/%Y %r', string_col)从字符串中解析出日期时间

If applied to sample string in your question - you got如果应用于您问题中的示例字符串 - 您得到

As @ Mikhail Berlyant rightly said, using the parse_datetime('%m/%d/%Y %r', string_col)正如@ Mikhail Berlyant所说，使用parse_datetime('%m/%d/%Y %r', string_col)

function will convert your badly formatted dates to a standard format as per ISO 8601 accepted by Google Bigquery .函数会将格式错误的日期转换为Google Bigquery接受的 ISO 8601 标准格式。 the best option will then be to save these query results to a new table on the database in your Bigquery Project.最好的选择是将这些查询结果保存到 Bigquery 项目中数据库的新表中。

I had a similar issue.我有一个类似的问题。 Below is an image of my table which i uploaded with all columns in String format .下面是我上传的表格的图片，其中包含所有字符串格式的列。

Next up was that i applied the following settings to the query below接下来是我将以下设置应用于下面的查询

The Settings below stored the query output to a new table called heartrateSeconds_clean on the same dataset下面的设置将查询输出存储到同一数据集上名为heartrateSeconds_clean的新表中

The Write if empty option is a good option to avoid overwriting the existing raw data or just arbitrarily writing output to a temporary table, except if you are sure you want to do so. Write if empty选项是避免覆盖现有原始数据或随意将输出写入临时表的好选项，除非您确定要这样做。 Save the settings and proceed to Run your Query .保存设置并继续运行您的查询。

As seen below, the output schema of the new table is automatically updated如下所示，新表的输出模式会自动更新

Below is the new preview of the resulting table下面是结果表的新预览

NB: I did not apply an ORDER BY clause to the Results hence the data is not ordered by any specific column in both versions of the same table.注意：我没有对结果应用ORDER BY子句，因此数据没有按同一表的两个版本中的任何特定列排序。 This dataset has over 2M rows.这个数据集有超过 200 万行。