简体   繁体   English

在hadoop / cascading中处理UTF-16LE编码的文件

[英]Process UTF-16LE encoded file in hadoop/cascading

I need to process a UTF-16LE encoded file in cascading on top of hadoop. 我需要在hadoop上进行级联处理UTF-16LE编码的文件。 I've tried following approaches but none of these are working. 我尝试了以下方法,但是这些方法均无效。

  • While assigning value -Xmx1024m -Dfile.encoding=UTF-16LE to property mapreduce.map.java.opts in mapred-site.xml failed due to NullPointerException at: com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187) But this method works for UTF-8. 在为mapred-site.xml中的属性mapreduce.map.java.opts分配值-Xmx1024m -Dfile.encoding=UTF-16LE ,由于NullPointerException而在com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)但是此方法适用于UTF-8。 Is hadoop unable to process UTF-16 data? hadoop无法处理UTF-16数据吗?
  • Doing System.setProperty("file.encoding", "UTF-16LE"); 正在做System.setProperty("file.encoding", "UTF-16LE"); in code is also unable to parse the data 在代码中也无法解析数据
  • Overriding charset of TextDelimited class of Cascading is also unable to process data 级联的TextDelimited类的重写字符集也无法处理数据

However using BufferedReader to read it in UTF-16LE parses the data correctly. 但是,使用BufferedReader在UTF-16LE中读取它可以正确解析数据。

Please help 请帮忙

Thanks in advance 提前致谢

在某处发现:Hadoop不支持UTF-16文件

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM