Tag[recordreader] Recent Newest Questions

Image to image DataSetIterator using dl4j

I would like to use DeepLearning4j to build and train a U-Net network. To do this I need a dataset iterator that feed the network with an image in inp ...

How to read a simple CSV file with Datavec

I want to read a simple CSV file with just a list of numbers using Datavec, for use within Deeplearning4j. I've tried numerous examples but keep getti ...

MapReduce basics

I have a text file of 300mb with block size of 128mb. So total 3 blocks 128+128+44 mb would be created. Correct me - For map reduce default input spli ...

How to create splits from a sequence file in Hadoop?

In Hadoop, I have a sequence file of 3GB size. I want to process it in parallel. Therefore, I am going to create 8 maptasks and hence 8 FileSplits. F ...

Hadoop 2: Empty result when using custom InputFormat

I want to use a own FileInputFormat with a custom RecordReader to read csv data into <Long><String> pairs. Therefore I created the class ...

How to make Hadoop MR to read only files instead of folders in input path

As per our requirement, the output of one job will be the input of other job. By using Multiple outputs concepts we are creating a new folder in outp ...

How do I convert EBCDIC to TEXT using Hadoop Mapreduce

I need to parse an EBCDIC input file format. Using Java, I am able to read it like below: But in Hadoop Mapreduce, I need to parse via RecordReader ...

Concept of RecordReaders

We know that prior to Mapper phase the files are split and the RecordReader starts working to emit a input to the Mapper. My question is whether the r ...

Hadoop Mapreduce with compressed/encrypted files (file of large size)

I have hdfs cluster which stores large csv files in a compressed/encrypted form as selected by end user. For compression, encryption, I have create a ...

passing arguments to record reader in mapreduce hadoop

This is my code for using variours arg import java.io.File; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apac ...

Hadoop custom record reader implementation

I'm finding hard time in understanding the flow of what is happening in nextKeyValue() method explained in the below link: http://analyticspro.org/20 ...

How does hadoop RecordReader identify records

When processing text file how does hadoop identify records ? Is it based on newline characters or full stops ? If I have a text file list of 5000 wor ...

Hadoop MapReduce RecordReader Implementation Necessary?

From the Apache doc on the Hadoop MapReduce InputFormat Interface: "[L]ogical splits based on input-size is insufficient for many applications sin ...

How does mapper run() method process the last record?

With above snippet when the mapper's run method is called, everytime it gets the next key,value pair by nextkeyvalue() function from recordreader an ...

Reading a record broken down into two lines because of /n in MapReduce

I am trying to write a custom reader which serves me the purpose of reading a record (residing in two lines) with defined number of fields. For Eg ...

jackson jsonparser restart parsing in broken JSON

I am using Jackson to process JSON that comes in chunks in Hadoop. That means, they are big files that are cut up in blocks (in my problem it's 128M b ...

Hadoop + Jackson parsing: ObjectMapper reads Object and then breaks

I am implementing a JSON RecordReader in Hadoop with Jackson. By now I am testing locally with JUnit + MRUnit. The JSON files contain one object each, ...

mapreduce.TextInputFormat hadoop

I am a hadoop beginner. I came across this custom RecordReader program which reads 3 lines at a time and outputs the number of times a 3 line input wa ...

Hadoop Map reduce Testing - custom record reader

I have written a custom record reader and looking for sample test code to test my custom reader using MRUnit or any other testing framework. Its worki ...

Hadoop - Multiple Files from Record Reader to Map Function

I have implemented a custom Combine File Input Format in order to create splits for Map task composed by group of files. I created a solution which pa ...