Hadoop map-reduce输出包含奇怪的字符

Question

I am running a map reduce job. 我正在运行地图减少工作。 When I run it on my machine which is a single node cluster the output is as shown 当我在我的机器上运行它是一个单节点集群时，输出如图所示

hduser@nikhil-VirtualBox:/usr/local/hadoop/hadoop-1.0.4$ bin/hadoop dfs -text /user/hduser/output16/part-r-00000
0   Required Genotype column (s), Must not contain NULLS for required fields, failed, 5, 1: GENE_NAME; 2: GENE_NAME; 4: GENE_NAME; 5: GENE_NAME; 9: GENE_NAME

However when I run the same on Amazon EMR on a larger dataset, I get the following with all weird characters. 但是，当我在更大的数据集上的Amazon EMR上运行相同的操作时，我得到以下所有奇怪的字符。 What might be the reason ? 可能是什么原因？

SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text\00\00\00\00\00\00\968\D6\FA\E1>X(.q\8B!\ABQ\00\00-\00\00\00
1537044153\8ERequired Genotype column (s), Must not contain NULLS for required fields, failed, 1, 1: VARIANT_START_POSITION; 2: VARIANT_START_POSITION;

Answer 1

The header (SEQTextText) tells you that this is a SequenceFile with a org.apache.hadoop.io.Text as key and value. 标头（SEQTextText）告诉您这是一个带有org.apache.hadoop.io.Text作为键和值的SequenceFile 。

So this is binary and not plain text and you can read it with a SequenceFile.Reader . 所以这是二进制而不是纯文本，您可以使用SequenceFile.Reader读取它。

Hadoop map-reduce输出包含奇怪的字符

问题描述

1 个解决方案

解决方案1
2 已采纳 2012-11-14 06:28:47

Hadoop map-reduce输出包含奇怪的字符

问题描述

1 个解决方案

解决方案1 2 已采纳 2012-11-14 06:28:47

解决方案1
2 已采纳 2012-11-14 06:28:47