[英]Hadoop Map-Reduce . RecordReader
我正在嘗試解決以下RecordReader問題。 輸入文件示例:
1,1
2,2
3,3
4,4
5,5
6,6
7,7
.......
.......
我希望我的RecordReader返回
key | Value
0 |1,1:2,2:3,3:4,4:5,5
4 |2,2:3,3:......6,6
6 |3,3:4,4......6,6,7,7
(對於第一個值的前五行,對於第二個值的五行從第二行開始,對於第三個值的五行從第三行開始,依此類推)
public class MyRecordReader extends RecordReader<LongWritable, Text> {
@Override
public boolean nextKeyValue() throws IOException, InterruptedException {
while (pos < end) {
key.set(pos);
// five line logic
Text nextLine=new Text();
int newSize = in.readLine(value, maxLineLength,
Math.max((int)Math.min(Integer.MAX_VALUE, end-pos),
maxLineLength));
fileSeek+=newSize;
for(int n=0;n<4;n++)
{
fileSeek+=in.readLine(nextLine, maxLineLength,
Math.max((int)Math.min(Integer.MAX_VALUE, end-pos),
maxLineLength));
value.append(":".getBytes(), 0,1);
value.append(nextLine.getBytes(), 0, nextLine.getLength());
}
if (newSize == 0) {
return false;
}
pos += newSize;
if (newSize < maxLineLength) {
return true;
}
// line too long. try again
LOG.info("Skipped line of size " + newSize + " at pos " + (pos - newSize));
}
return false;
}
}
但這將返回值
key | Value
0 |1,1:2,2:3,3:4,4:5,5
4 |6,6:7,7.......10,10
6 |11,11:12,12:......14,14
有人可以幫助我提供此代碼,或者為RecodeReader提供新的代碼嗎? 問題的要求(可以幫助您了解用例)謝謝
我想我理解問題了……這就是我要做的:包裝另一個RecordReader並將其中的鍵/值緩沖到本地隊列中。
public class MyRecordReader extends RecordReader<LongWritable, Text> {
private static final int BUFFER_SIZE = 5;
private static final String DELIMITER = ":";
private Queue<String> valueBuffer = new LinkedList<String>();
private Queue<Long> keyBuffer = new LinkedList<Long>();
private LongWritable key = new LongWritable();
private Text value = new Text();
private RecordReader<LongWritable, Text> rr;
public MyRecordReader(RecordReader<LongWritable, Text> rr) {
this.rr = rr;
}
@Override
public void close() throws IOException {
rr.close();
}
@Override
public LongWritable getCurrentKey() throws IOException, InterruptedException {
return key;
}
@Override
public Text getCurrentValue() throws IOException, InterruptedException {
return value;
}
@Override
public float getProgress() throws IOException, InterruptedException {
return rr.getProgress();
}
@Override
public void initialize(InputSplit arg0, TaskAttemptContext arg1)
throws IOException, InterruptedException {
rr.initialize(arg0, arg1);
}
@Override
public boolean nextKeyValue() throws IOException, InterruptedException {
if (valueBuffer.isEmpty()) {
while (valueBuffer.size() < BUFFER_SIZE) {
if (rr.nextKeyValue()) {
keyBuffer.add(rr.getCurrentKey().get());
valueBuffer.add(rr.getCurrentValue().toString());
} else {
return false;
}
}
} else {
if (rr.nextKeyValue()) {
keyBuffer.add(rr.getCurrentKey().get());
valueBuffer.add(rr.getCurrentValue().toString());
keyBuffer.remove();
valueBuffer.remove();
} else {
return false;
}
}
key.set(keyBuffer.peek());
value.set(getValue());
return true;
}
private String getValue() {
StringBuilder sb = new StringBuilder();
Iterator<String> iter = valueBuffer.iterator();
while (iter.hasNext()) {
sb.append(iter.next());
if (iter.hasNext()) sb.append(DELIMITER);
}
return sb.toString();
}
}
然后,例如,您可以具有一個自定義InputFormat,該擴展了TextInputFormat並覆蓋了createRecordReader
方法以調用super.createRecordReader
並返回包裝在MyRecordReader
中的結果,如下所示:
public class MyTextInputFormat extends TextInputFormat {
@Override
public RecordReader<LongWritable, Text> createRecordReader(
InputSplit arg0, TaskAttemptContext arg1) {
return new MyRecordReader(super.createRecordReader(arg0, arg1));
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.