从Java Map Reduce代码读取Hive托管表

Question

我想从我的地图缩小作业中读取托管的Hive表数据。 我有一个托管的Hive表，该表是从另一个表创建的，该表是从外部Hive表创建的。 我想在我的最终托管Hive表上运行map reduce作业。 我读到托管表有一个分隔符，默认为“ char 1” ASCII字符。 所以我这样做：

public static final String SEPARATOR_FIELD = new String(new char[] {1});

后来我做了一个循环：

end = rowTextObject.find(SEPARATOR_FIELD, start);

但是，当我运行map reduce jar时，在上一行和下一行给出了非法参数异常：

public void map(LongWritable key, Text rowTextObject, Context context) throws IOException, InterruptedException

PS：我在github上查找了一个项目，以读取mapreduce作业中的托管配置单元表，但我无法理解@ https://github.com/facebook/hive-io-experimental 。

Answer 1

假设我的输入文件如下（例如xyz.txt）：-
111 \\ 001 222
121 \\ 001 222
131 \\ 001 222
141 \\ 001 222
151 \\ 001 222
161 \\ 001 222
171 \\ 001 222
现在\\ 001是我的配置单元默认定界符（例如）。
现在，为了解析已经使用map reduce加载到hive表中的文件，我将在map方法中执行以下操作：

public class MyMapper extends Mapper<LongWritable, Text, Text, Text>{
    public void map(LongWritable key, Text value,Context context) throws java.io.IOException ,InterruptedException
    {

        String[]vals=value.toString().split("\\001");
        context.write(new Text(vals[0]),new Text("1"));
     }

}

您的驱动程序方法将是正常的一种，如下所示：

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(MyMapper.class);
FileInputFormat.addInputPath(job, new Path(xyz.txt));

因此，根据我给出的map方法，最终输出将如下所示：
111 1
121 1
131 1
141 1
151 1
161 1
171 1
这就是您正在寻找的东西，就像我在map方法中所做的解析一样？

从Java Map Reduce代码读取Hive托管表

问题描述

1 个解决方案

解决方案1
0 2013-07-26 08:32:50

从Java Map Reduce代码读取Hive托管表

问题描述

1 个解决方案

解决方案1 0 2013-07-26 08:32:50

解决方案1
0 2013-07-26 08:32:50