简体   繁体   English

我如何在 aws athena 中查询 struct<$oid:string>

[英]How can I query at struct<$oid:string> in aws athena

I want to query data that is stored in MongoDB and exported out into a number of JSON files, stored in S3我想查询存储在 MongoDB 中的数据并导出到多个 JSON 文件中,存储在 S3 中

I am using AWS Glue to read the files into Athena however the data type for the id on each table is imported as struct<$oid:string>我正在使用 AWS Glue 将文件读入 Athena,但是每个表上 id 的数据类型被导入为struct<$oid:string>

I have tried every variation of adding quotations around the fields with no luck.我已经尝试了在字段周围添加引号的各种变体,但没有成功。 everything I try results in the error name expected at the position 7 of 'struct<$oid:string>' but '$' is found.我尝试的所有操作都会导致在name expected at the position 7 of 'struct<$oid:string>' but '$' is found.

Is there any way I can read these tables in their current form or do I need to declare their type in Glue?有什么办法可以读取这些表的当前形式,或者我是否需要在 Glue 中声明它们的类型?

Glue Crawlers create schemas that match what they find, without considering if they will work with, for example Athena. Glue Crawlers 创建与他们找到的内容相匹配的模式,而不考虑他们是否会使用,例如 Athena。 In Athena you can't have a struct property with an initial $ , but Glue doesn't take that into account – partly because maybe you will be using the table with something else where that is not a problem, and partly because what else can it do, that is the name of the property.在 Athena 中,您不能拥有带有初始$的结构属性,但 Glue 不会考虑这一点——部分原因是您可能会将表与其他不成问题的东西一起使用,部分原因是还有什么可以确实如此,那是财产的名称。

There are two ways around it, but neither will work if you continue to use a crawler.有两种解决方法,但如果您继续使用爬虫,这两种方法都行不通。 You will need to modify the table schema, but if you continue to run the crawler it will just revert it back again.您将需要修改表架构,但如果您继续运行爬虫,它只会再次恢复原状。

The first, and probably simplest option, is to change the type of the column to STRING and then use a JSON function at query time to extract the value using JSONPath ( $ is a special character in JSONPath, but you should be able to escape it).第一个可能也是最简单的选项是将列的类型更改为STRING ,然后在查询时使用JSON function来使用 JSONPath 提取值( $是 JSONPath 中的特殊字符,但您应该能够将其转义).

The second option is to use the "mappings" feature of the Hive JSON serde .第二种选择是使用Hive JSON serde 的“映射”功能 I'm not 100% sure if it will work for this case, but it could be worth a try.我不是 100% 确定它是否适用于这种情况,但值得一试。 The docs are not very extensive on how to configure it, unfortunately.不幸的是,文档没有详细介绍如何配置它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AWS Step function 和 Athena:我可以从输入路径配置查询字符串吗? - AWS Step function And Athena : can I configure query string from inputpath? 在使用 dbeaver 时,如何添加添加了 aws athena 查询引擎 3 新参数的 jdbc 参数? - How can I add jdbc parameter that added new parameter of aws athena query engine 3 while using dbeaver? 如何使用 AWS ATHENA 查询填充数据表或数据集? - How to fill a datatable or dataset with an AWS ATHENA query? Athena 嵌套结构查询——如何查询 SQL 中的 Value_counts - Athena nested Struct Querying - how to query Value_counts in SQL 如何使用 boto3/python 从 AWS Athena 视图中保存 SQL 脚本 - How can I save SQL script from AWS Athena view with boto3/python 如何在aws athena中查询unix纪元时间戳中的时间 - How to query the time in unix epoch timestamp in aws athena 如何在 AWS Athena 查询编辑器中添加注释行 - How to add comment line in AWS Athena query editor 如何删除 aws athena 表中的不间断空格? - How can I remove non-breaking spaces in my aws athena table? AWS Athena 查询 JSON 数组与 AND 条件 - AWS Athena query JSON array with AND Condition AWS Athena - 不支持这种类型的查询[基本查询] - AWS Athena - Queries of this type are not supported [basic query]
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM