如何在EMR上重命名Hive的输出文件？

Question

The output of Hive on EMR is a file named 000000_0 (perhaps a different number if there is more than 1 reducer). Hive在EMR上的输出是一个名为000000_0的文件（如果存在多个reducer，则该数字可能不同）。

How do I get this file to be named differently? 如何使该文件的名称不同？ I see two options: 我看到两个选择：

1) Get Hive to write it differently 1）让Hive改写它

2) Rename the file(s) in S3 after it is written. 2）写入文件后，在S3中重命名文件。 This is could be a problem: from what I've read S3 doesn't really have a "rename". 这可能是个问题：据我了解，S3并没有真正的“重命名”。 You have to copy it, and delete the original. 您必须将其复制，然后删除原始文件。 When dealing with a file that is 1TB in size, for example, this could cause performance problems or increase usage cost? 例如，当处理大小为1TB的文件时，这可能会导致性能问题或增加使用成本吗？

Answer 1

The AWS Command Line Interface (CLI) has a convenient mv command that you could add to a script: AWS Command Line Interface （CLI）有一个方便的mv命令，您可以将其添加到脚本中：

aws s3 mv s3://my-bucket/000000_0 s3://my-bucket/data1

Or, you could do it programmatically via the Amazon S3 COPY API call . 或者，您可以通过Amazon S3 COPY API调用以编程方式进行操作。

如何在EMR上重命名Hive的输出文件？

问题描述

1 个解决方案

解决方案1
2 2014-11-26 01:35:58

如何在EMR上重命名Hive的输出文件？

问题描述

1 个解决方案

解决方案1 2 2014-11-26 01:35:58

解决方案1
2 2014-11-26 01:35:58