[英]Process UTF-16LE encoded file in hadoop/cascading
I need to process a UTF-16LE encoded file in cascading on top of hadoop. 我需要在hadoop上进行级联处理UTF-16LE编码的文件。 I've tried following approaches but none of these are working. 我尝试了以下方法,但是这些方法均无效。
-Xmx1024m -Dfile.encoding=UTF-16LE
to property mapreduce.map.java.opts
in mapred-site.xml failed due to NullPointerException at: com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
But this method works for UTF-8. 在为mapred-site.xml中的属性mapreduce.map.java.opts
分配值-Xmx1024m -Dfile.encoding=UTF-16LE
,由于NullPointerException而在com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
但是此方法适用于UTF-8。 Is hadoop unable to process UTF-16 data? hadoop无法处理UTF-16数据吗? System.setProperty("file.encoding", "UTF-16LE");
正在做System.setProperty("file.encoding", "UTF-16LE");
in code is also unable to parse the data 在代码中也无法解析数据 However using BufferedReader to read it in UTF-16LE parses the data correctly. 但是,使用BufferedReader在UTF-16LE中读取它可以正确解析数据。
Please help 请帮忙
Thanks in advance 提前致谢
在某处发现:Hadoop不支持UTF-16文件
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.