简体   繁体   English

如何使用一个位置路径从多个文件夹创建多个表,雅典娜也应该使用胶水爬虫来处理它

[英]how to create multiple table from multiple folder with one location path and athena should also work on it with glue crawler

I have tried this not achieving required results- I have multiple CSV files in a folder of s3 bucket but when it creates multiple table for it then Athena returns zero results so I made a different folder for each file then it works fine.我试过这没有达到所需的结果 - 我在 s3 存储桶的文件夹中有多个 CSV 文件,但是当它为它创建多个表时,Athena 返回零结果,所以我为每个文件创建了一个不同的文件夹,然后它工作正常。 problem- but if in future more folders will be added then I have to go to crawler and have to add a new location path for each newly added folder so is there any way to do it automatically or some other way to do it.问题 - 但如果将来会添加更多文件夹,那么我必须去爬虫,并且必须为每个新添加的文件夹添加一个新的位置路径,所以有什么方法可以自动执行或以其他方式执行此操作。 I am using glue crawler and s3 bucket athena for query run on multiple CSV files.我正在使用胶水爬虫和 s3 存储桶 athena 在多个 CSV 文件上运行查询。

In general a table needs all of its files to be in a directory, and no other files to be in that directory.通常,表需要其所有文件都在一个目录中,并且该目录中没有其他文件。

There is however, a mechanism that makes it possible to create tables that include just specific files.但是,有一种机制可以创建仅包含特定文件的表。 You can read more about that in the the second part of this answer: Partition Athena query by S3 created date (scroll down a bit after the horizontal rule).您可以在此答案的第二部分阅读更多相关信息: 按 S3 创建日期分区 Athena 查询(在水平规则后向下滚动一点)。 You can also find an example in the S3 Inventory documentation: https://docs.aws.amazon.com/AmazonS3/latest/dev/storage-inventory.html您还可以在 S3 Inventory 文档中找到示例: https : //docs.aws.amazon.com/AmazonS3/latest/dev/storage-inventory.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM