简体   繁体   中英

Process UTF-16LE encoded file in hadoop/cascading

I need to process a UTF-16LE encoded file in cascading on top of hadoop. I've tried following approaches but none of these are working.

  • While assigning value -Xmx1024m -Dfile.encoding=UTF-16LE to property mapreduce.map.java.opts in mapred-site.xml failed due to NullPointerException at: com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187) But this method works for UTF-8. Is hadoop unable to process UTF-16 data?
  • Doing System.setProperty("file.encoding", "UTF-16LE"); in code is also unable to parse the data
  • Overriding charset of TextDelimited class of Cascading is also unable to process data

However using BufferedReader to read it in UTF-16LE parses the data correctly.

Please help

Thanks in advance

在某处发现:Hadoop不支持UTF-16文件

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM