简体繁体 English

如何使用一个位置路径从多个文件夹创建多个表，雅典娜也应该使用胶水爬虫来处理它

[英]how to create multiple table from multiple folder with one location path and athena should also work on it with glue crawler

原文 2020-03-20 04:52:30 7 1 amazon-web-services/ amazon-s3/ aws-glue/ amazon-athena/ aws-glue-data-catalog

I have tried this not achieving required results- I have multiple CSV files in a folder of s3 bucket but when it creates multiple table for it then Athena returns zero results so I made a different folder for each file then it works fine.我试过这没有达到所需的结果 - 我在 s3 存储桶的文件夹中有多个 CSV 文件，但是当它为它创建多个表时，Athena 返回零结果，所以我为每个文件创建了一个不同的文件夹，然后它工作正常。 problem- but if in future more folders will be added then I have to go to crawler and have to add a new location path for each newly added folder so is there any way to do it automatically or some other way to do it.问题 - 但如果将来会添加更多文件夹，那么我必须去爬虫，并且必须为每个新添加的文件夹添加一个新的位置路径，所以有什么方法可以自动执行或以其他方式执行此操作。 I am using glue crawler and s3 bucket athena for query run on multiple CSV files.我正在使用胶水爬虫和 s3 存储桶 athena 在多个 CSV 文件上运行查询。

1 个解决方案

In general a table needs all of its files to be in a directory, and no other files to be in that directory.通常，表需要其所有文件都在一个目录中，并且该目录中没有其他文件。

There is however, a mechanism that makes it possible to create tables that include just specific files.但是，有一种机制可以创建仅包含特定文件的表。 You can read more about that in the the second part of this answer: Partition Athena query by S3 created date (scroll down a bit after the horizontal rule).您可以在此答案的第二部分阅读更多相关信息：按 S3 创建日期分区 Athena 查询（在水平规则后向下滚动一点）。 You can also find an example in the S3 Inventory documentation: https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-inventory.html您还可以在 S3 Inventory 文档中找到示例： https : //docs.aws.amazon.com/AmazonS3/latest/dev/storage-inventory.html

如何使用JSON中的AWS Glue Crawler分类器创建Athena模式？ - How to create Athena schema using AWS Glue Crawler classifier from JSON?

当s3数据存储同时包含json和.gz压缩文件时，如何通过Glue搜寻器创建AWS Athena表？ - How to create AWS Athena table via Glue crawler when the s3 data store has both json and .gz compressed files?

如何在Amazon Athena中创建多个表 - How to create multiple table in Amazon Athena

AWS胶水/ pyspark-如何使用Glue以编程方式创建Athena表 - aws glue / pyspark - how to create Athena table programmatically using Glue

如何在 terraform 中创建多个胶水作业作为一个胶水作业 - How to create multiple glue jobs as one glue job in terraform

创建多个表的胶水爬虫 - Glue crawler creating multiple tables

AWS更新Athena meta：胶履带vs MSCK维修表 - AWS update Athena meta: Glue Crawler vs MSCK Repair Table

Glue 爬虫从分区的 S3 存储桶创建了多个表 - Glue crawler created multiple tables from a partitioned S3 bucket

从Gel爬虫进行ETL之后对Athena查询进行流水线化 - Pipelining Athena query after ETL from glue crawler

如何通过 cloudformation 为胶水爬虫设置“从表继承模式”？ - How set “Inherit schema from table” for glue crawler via cloudformation?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用JSON中的AWS Glue Crawler分类器创建Athena模式？ - How to create Athena schema using AWS Glue Crawler classifier from JSON? 当s3数据存储同时包含json和.gz压缩文件时，如何通过Glue搜寻器创建AWS Athena表？ - How to create AWS Athena table via Glue crawler when the s3 data store has both json and .gz compressed files? 如何在Amazon Athena中创建多个表 - How to create multiple table in Amazon Athena AWS胶水/ pyspark-如何使用Glue以编程方式创建Athena表 - aws glue / pyspark - how to create Athena table programmatically using Glue 如何在 terraform 中创建多个胶水作业作为一个胶水作业 - How to create multiple glue jobs as one glue job in terraform 创建多个表的胶水爬虫 - Glue crawler creating multiple tables AWS更新Athena meta：胶履带vs MSCK维修表 - AWS update Athena meta: Glue Crawler vs MSCK Repair Table Glue 爬虫从分区的 S3 存储桶创建了多个表 - Glue crawler created multiple tables from a partitioned S3 bucket 从Gel爬虫进行ETL之后对Athena查询进行流水线化 - Pipelining Athena query after ETL from glue crawler 如何通过 cloudformation 为胶水爬虫设置“从表继承模式”？ - How set “Inherit schema from table” for glue crawler via cloudformation?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM