简体   繁体   English

如何在EMR上重命名Hive的输出文件?

[英]How to rename output file(s) of Hive on EMR?

The output of Hive on EMR is a file named 000000_0 (perhaps a different number if there is more than 1 reducer). Hive在EMR上的输出是一个名为000000_0的文件(如果存在多个reducer,则该数字可能不同)。

How do I get this file to be named differently? 如何使该文件的名称不同? I see two options: 我看到两个选择:

1) Get Hive to write it differently 1)让Hive改写它

2) Rename the file(s) in S3 after it is written. 2)写入文件后,在S3中重命名文件。 This is could be a problem: from what I've read S3 doesn't really have a "rename". 这可能是个问题:据我了解,S3并没有真正的“重命名”。 You have to copy it, and delete the original. 您必须将其复制,然后删除原始文件。 When dealing with a file that is 1TB in size, for example, this could cause performance problems or increase usage cost? 例如,当处理大小为1TB的文件时,这可能会导致性能问题或增加使用成本吗?

The AWS Command Line Interface (CLI) has a convenient mv command that you could add to a script: AWS Command Line Interface (CLI)有一个方便的mv命令,您可以将其添加到脚本中:

aws s3 mv s3://my-bucket/000000_0 s3://my-bucket/data1

Or, you could do it programmatically via the Amazon S3 COPY API call . 或者,您可以通过Amazon S3 COPY API调用以编程方式进行操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM