简体   繁体   English

使用mapreduce处理文件

[英]processing file using mapreduce

I use simple pig script that reads the input .txt file and for each line new filed is added. 我使用简单的Pig脚本读取输入的.txt文件,并为每行添加新字段。

The output relation is then stored into avro. 然后将输出关系存储到Avro中。

Is there any benefit to run such a script in the mapreduce mode compare to local mode? 与本地模式相比,在mapreduce模式下运行这样的脚本有什么好处?

Thank you 谢谢

In local mode you are running your job on your local machine. 在本地模式下,您正在本地计算机上运行作业。 With mapreduce you run your job in a cluster (your file will be splitted into pieces and will be processed on several machines in parallel). 使用mapreduce,您可以在群集中运行您的作业(您的文件将被分割成几部分,并将在多台计算机上并行处理)。

So, in theory, if your file is big enough (or there are lots of files like this to process), you'll be able to accomplish your job in less time with mapreduce mode. 因此,从理论上讲,如果您的文件足够大(或者有很多这样的文件要处理),则可以使用mapreduce模式在更少的时间内完成您的工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM