简体   繁体   中英

How do I convert EBCDIC to TEXT using Hadoop Mapreduce

I need to parse an EBCDIC input file format. Using Java, I am able to read it like below:

InputStreamReader rdr = new InputStreamReader(new FileInputStream("/Users/rr/Documents/workspace/EBCDIC_TO_ASCII/ebcdic.txt"), java.nio.charset.Charset.forName("ibm500"));

But in Hadoop Mapreduce, I need to parse via RecordReader which has not worked so far.

Can any one provide a solution to this problem?

最好的办法是先将数据转换为ASCII,然后再加载到HDFS。

Why is the file in EBCDIC ???, does it need to be ???

If it is just Text data, why not convert it to ascii when you send / pull the file from the Mainframe / AS400 ???.

If the file contains binary or Cobol numeric fields then you have several options

  1. Convert the file to normal Text on the mainframe (The Mainframe Sort utility is good at this), then send the file and convert it (to ascii) .
  2. If it is a Cobol file, There are some open source projects you could look at https://github.com/tmalaska/CopybookInputFormat or https://github.com/ianbuss/CopybookHadoop
  3. There are commercial packages for loading mainframe-Cobol data into hadoop.

您可以尝试通过Spark解析它,也许可以使用Cobrix(它是Spark的开源COBOL数据源)进行解析。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM