简体繁体 English

使用Hadoop解析日志文件

[英]Parse log file with Hadoop

原文 2013-04-19 09:06:10 2 1 java/ hadoop

Im a newbie to hadoop. 我是hadoop的新手。 I did the setup and executed the basic word count java program. 我进行了设置并执行了基本字数统计Java程序。 The results look good. 结果看起来不错。

My question is it possible to parse an extremely big log file to fetch only a few required lines using the map/reduce classes? 我的问题是可以使用map / reduce类解析一个非常大的日志文件以仅获取一些必需的行吗？ Or is some other step required? 还是需要其他步骤？

Any pointers in this direction will be very useful. 朝这个方向的任何指针将非常有用。 Thanks, Aarthi 谢谢，Aarthi

1 个解决方案

Yes it is entirely possible, and if the file is sufficiently large, I believe hadoop could prove to good way to tackle it, despite what nhahtdh says. 是的，这完全有可能，并且如果文件足够大，尽管nhahtdh说了什么，我相信hadoop可能是解决该问题的好方法。

Your mappers could simply act as the filters - check the values passed to them, and only if they fit the conditions of a required line do you context.write() it out. 您的映射器可以简单地充当过滤器-检查传递给它们的值，并且只有它们符合所需行的条件时，您才能context.write()出来。

You wont even need to write your own reducer, just use the default reduce() in the Reducer class. 您甚至不需要编写自己的reducer，只需在Reducer类中使用默认的reduce() 。