简体   繁体   English

如何加入Pig输出文件?

[英]How to join the Pig output files?

The pig script output a few part files (part-m-00000, part-m-00001, etc) with .pig_header and .pig_schema and I am trying to join them as one output csv. Pig脚本使用.pig_header和.pig_schema输出一些零件文件(part-m-00000,part-m-00001等),我正在尝试将它们作为一个输出csv加入。 I tried to use the hadoop merge 我尝试使用hadoop合并

hadoop fs -getmerge ./output output.csv

but the files are merged with the .pig_schema file as well so it becomes something like 但是文件也与.pig_schema文件合并,因此它变成了类似

header1,header2,header3
{"fields":[{"name": "header1", "type":...}]}
value1,value2,value3

How do I join them correctly without the .pig_schema included? 如何在不包含.pig_schema的情况下正确加入他们?

Thanks! 谢谢!

使用fileglob: hadoop fs -getmerge ./output/part* output.csv

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM