[英]How to rename output file(s) of Hive on EMR?
The output of Hive on EMR is a file named 000000_0 (perhaps a different number if there is more than 1 reducer). Hive在EMR上的输出是一个名为000000_0的文件(如果存在多个reducer,则该数字可能不同)。
How do I get this file to be named differently? 如何使该文件的名称不同? I see two options: 我看到两个选择:
1) Get Hive to write it differently 1)让Hive改写它
2) Rename the file(s) in S3 after it is written. 2)写入文件后,在S3中重命名文件。 This is could be a problem: from what I've read S3 doesn't really have a "rename". 这可能是个问题:据我了解,S3并没有真正的“重命名”。 You have to copy it, and delete the original. 您必须将其复制,然后删除原始文件。 When dealing with a file that is 1TB in size, for example, this could cause performance problems or increase usage cost? 例如,当处理大小为1TB的文件时,这可能会导致性能问题或增加使用成本吗?
The AWS Command Line Interface (CLI) has a convenient mv
command that you could add to a script: AWS Command Line Interface (CLI)有一个方便的mv
命令,您可以将其添加到脚本中:
aws s3 mv s3://my-bucket/000000_0 s3://my-bucket/data1
Or, you could do it programmatically via the Amazon S3 COPY
API call . 或者,您可以通过Amazon S3 COPY
API调用以编程方式进行操作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.