I need to parse an EBCDIC input file format. Using Java, I am able to read it like below:
InputStreamReader rdr = new InputStreamReader(new FileInputStream("/Users/rr/Documents/workspace/EBCDIC_TO_ASCII/ebcdic.txt"), java.nio.charset.Charset.forName("ibm500"));
But in Hadoop Mapreduce, I need to parse via RecordReader
which has not worked so far.
Can any one provide a solution to this problem?
最好的办法是先将数据转换为ASCII,然后再加载到HDFS。
Why is the file in EBCDIC ???, does it need to be ???
If it is just Text data, why not convert it to ascii when you send / pull the file from the Mainframe / AS400 ???.
If the file contains binary or Cobol numeric fields then you have several options
您可以尝试通过Spark解析它,也许可以使用Cobrix(它是Spark的开源COBOL数据源)进行解析。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.