简体繁体 English

使用PIG处理小文件

[英]Handling small files with PIG

原文 2013-09-04 15:48:46 7 1 hadoop/ mapreduce/ apache-pig

According to my understanding Map/Reduce works better with large files. 根据我的理解，Map / Reduce可以更好地处理大文件。 ( I understand its due to splitting logic ,etc ), we can put files as values and file name as key in the sequence files and optimize. （我理解它由于分裂逻辑等），我们可以将文件作为值和文件名作为序列文件中的关键并进行优化。

Now the issue is I am using PIG for analytics, and we have around thousands of files but all are in KB. 现在的问题是我使用PIG进行分析，我们有大约数千个文件，但都是以KB为单位。 As we know pig latin is converted and run as MR jobs, so I've a doubt that MR jobs will be in-efficient owing to small files. 我们知道猪拉丁被转换并作为MR工作运行，所以我怀疑由于文件很小，MR工作将无效。

Is there any way by which I can get some control over small files handling over pig ? 有什么方法可以控制对猪的小文件处理吗？ Is there any out of the box solution? 有没有开箱即用的解决方案？

1 个解决方案

Pig具有将小文件组合成更大块的功能： http ： //pig.apache.org/docs/r0.11.1/perf.html#combine-files

小文件较多时对PIG的排序过程 - sorting process of PIG when there are many small input files

Spark处理小文件（coalesce与CombineFileInputFormat） - Spark handling small files (coalesce vs CombineFileInputFormat)

使用Pig Latin时有许多小输入文件可以提高性能 - improving performance when you have many small input files using Pig Latin

为什么我的avro输出文件在我的养猪工作中是如此之小而如此之多？ - Why are my avro output files so small and so numerous in my pig job?

在PIG中读取带有模式的文件 - Read files with pattern in PIG

在PIG中加载多个文件 - Loading Multiple Files in PIG

在PIG中使用时间戳记合并文件 - Concatinating files with Timestamp in PIG

Apache Pig：在Pig中处理数据类型时面临问题 - Apache Pig: Facing issue while handling datatype in Pig

如何对Pig中的2个日志文件求和 - How to sum 2 log files in pig

如何加入Pig输出文件？ - How to join the Pig output files?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 小文件较多时对PIG的排序过程 - sorting process of PIG when there are many small input files Spark处理小文件（coalesce与CombineFileInputFormat） - Spark handling small files (coalesce vs CombineFileInputFormat) 使用Pig Latin时有许多小输入文件可以提高性能 - improving performance when you have many small input files using Pig Latin 为什么我的avro输出文件在我的养猪工作中是如此之小而如此之多？ - Why are my avro output files so small and so numerous in my pig job? 在PIG中读取带有模式的文件 - Read files with pattern in PIG 在PIG中加载多个文件 - Loading Multiple Files in PIG 在PIG中使用时间戳记合并文件 - Concatinating files with Timestamp in PIG Apache Pig：在Pig中处理数据类型时面临问题 - Apache Pig: Facing issue while handling datatype in Pig 如何对Pig中的2个日志文件求和 - How to sum 2 log files in pig 如何加入Pig输出文件？ - How to join the Pig output files?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM