How do I convert EBCDIC to TEXT using Hadoop Mapreduce

Question

I need to parse an EBCDIC input file format. Using Java, I am able to read it like below:

InputStreamReader rdr = new InputStreamReader(new FileInputStream("/Users/rr/Documents/workspace/EBCDIC_TO_ASCII/ebcdic.txt"), java.nio.charset.Charset.forName("ibm500"));

But in Hadoop Mapreduce, I need to parse via RecordReader which has not worked so far.

Can any one provide a solution to this problem?

Answer 1

最好的办法是先将数据转换为ASCII，然后再加载到HDFS。

Answer 2

Why is the file in EBCDIC ???, does it need to be ???

If it is just Text data, why not convert it to ascii when you send / pull the file from the Mainframe / AS400 ???.

If the file contains binary or Cobol numeric fields then you have several options

Convert the file to normal Text on the mainframe (The Mainframe Sort utility is good at this), then send the file and convert it (to ascii) .
If it is a Cobol file, There are some open source projects you could look at https://github.com/tmalaska/CopybookInputFormat or https://github.com/ianbuss/CopybookHadoop
There are commercial packages for loading mainframe-Cobol data into hadoop.

Answer 3

您可以尝试通过Spark解析它，也许可以使用Cobrix（它是Spark的开源COBOL数据源）进行解析。

How do I convert EBCDIC to TEXT using Hadoop Mapreduce

Question

3 answers

solution1
0 2016-01-19 06:00:13

solution2
0 2016-01-19 07:14:48

solution3
0 2018-08-22 19:26:19

How do I convert EBCDIC to TEXT using Hadoop Mapreduce

Question

3 answers

solution1 0 2016-01-19 06:00:13

solution2 0 2016-01-19 07:14:48

solution3 0 2018-08-22 19:26:19

solution1
0 2016-01-19 06:00:13

solution2
0 2016-01-19 07:14:48

solution3
0 2018-08-22 19:26:19