简体   繁体   English

Apache Camel:带重音字符的文件处理

[英]Apache Camel: file processing with accented characters

We are trying to parse a text file from AWS S3 (sdk2) which has some accented characters like î.我们正在尝试解析来自 AWS S3 (sdk2) 的文本文件,其中包含一些重音字符,例如 î。 We are using camel bindy format @FixedLengthRecord to map the file rows to our DTO, but these accented chars are getting mapped as question mark ?我们正在使用骆驼绑定格式@FixedLengthRecord 将文件行映射到我们的 DTO,但是这些重音字符被映射为问号?

We are not sure yet about the source file encoding but it shows as ANSI in Notepad++ and also shows the char properly in the input file.我们还不确定源文件编码,但它在 Notepad++ 中显示为 ANSI,并且还在输入文件中正确显示字符。

Tried multiple approaches so far like overriding the default charset with different ones US-ASCII, cp1252到目前为止尝试了多种方法,例如用不同的 US-ASCII、cp1252 覆盖默认字符集

System.setProperty("org.apache.camel.default.charset", "cp1252");

Along with .convertBodyTo(String.class, "UTF-8") in our route definition but none seems to work.在我们的路由定义中与 .convertBodyTo(String.class, "UTF-8") 一起使用,但似乎都不起作用。

Tried reading the camel documentation https://camel.apache.org/components/latest/file-component.html and similar questions on stackoverflow but didn't find any matching solution yet, any other pointers will be highly appretiated.尝试阅读骆驼文档https://camel.apache.org/components/latest/file-component.html和关于 stackoverflow 的类似问题,但还没有找到任何匹配的解决方案,任何其他指针都会受到高度赞赏。

Finally got a clue in the way camel AWS2S3Endpoint was reading the S3 objects.终于得到了骆驼 AWS2S3Endpoint 读取 S3 对象的方式的线索。 It was defaulting to UTF-8它默认为 UTF-8

Reader reader = new BufferedReader(new InputStreamReader(s3Object, Charset.forName(StandardCharsets.UTF_8.name())));

This is being fixed in latest 3.6.0 snapshot version as mentioned over camel mailing list.正如骆驼邮件列表中提到的,这在最新的 3.6.0 快照版本中得到了修复。 We could test it successfully with the snapshot version along with convertBodyTo(String.class, "ISO-8859-1") in camel route我们可以在骆驼路线中使用快照版本和 convertBodyTo(String.class, "ISO-8859-1") 成功测试它

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM