简体   繁体   English

从目录中读取文件以创建 ZIP hadoop

[英]Read files from directory to create a ZIP hadoop

I'm looking for Hadoop examples, something more complex than the wordcount example.我正在寻找 Hadoop 示例,比 wordcount 示例更复杂。

What I want to do It's read the files in a directory in Hadoop and get a zip, so I have thought to collect al the files in the map class and create the zip file in the reduce class.我想做的是读取Hadoop中目录中的文件并获取zip,所以我想收集map类中的所有文件并在reduce类中创建zip文件。

Can anyone give me a link to a tutorial or example than can help me to built it?谁能给我一个教程或示例的链接来帮助我构建它?

I don't want anyone to do this for me, I'm asking for a link with better examples than the wordaccount.我不希望任何人为我做这件事,我要求提供一个比 wordaccount 更好的例子的链接。

I almost get it, if you need it: https://github.com/flopezluis/testing-hadoop我几乎明白了,如果你需要它: https : //github.com/flopezluis/testing-hadoop

If your objective is to to normalize the structured data in records, coming in from several inputs and then process it.如果您的目标是规范化记录中的结构化数据,来自多个输入,然后对其进行处理。 Based on it, i think you really need to look at this article which helped me in past.基于它,我认为你真的需要看看这篇过去对我有帮助的文章 It included How To Normalize Data Using Hadoop/MapReduce and provide Java based source code as below:它包括如何使用 Hadoop/MapReduce 规范化数据并提供基于 Java 的源代码如下:

  • Step 1: Extract the column value pairs from the original data.步骤 1:从原始数据中提取列值对。
  • Step 2: Extract column-value Pairs Not In Master ID File步骤 2:提取不在主 ID 文件中的列值对
  • Step 3: Calculate the Maximum ID for Each Column in the Master File步骤 3:计算主文件中每列的最大 ID
  • Step 4: Calculate a New ID for the Unmatched Values步骤 4:为不匹配的值计算新 ID
  • Step 5: Merge the New Ids with the Existing Master IDs第 5 步:将新 ID 与现有主 ID 合并
  • Step 6: Replace the Values in the Original Data with IDs步骤 6:用 ID 替换原始数据中的值

There is another examples about Method for Reading and Writing General Record Structures using new Writable and InputFormat classes in JAVA.还有另一个关于使用 JAVA 中新的 Writable 和 InputFormat 类读取和写入通用记录结构的方法的示例。 Have a look here .看看这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM