简体   繁体   中英

ISO encoding with Japanese Frame file

I have a Japanese content which is being converted to MS help with a certain tool. The problem is that the third party tool isn't using utf-8 encoding and is creating a .xml with garbage characters:

    <param name="Name" value="&#195;&#137;A&#195;&#137;v&#195;&#137;&#195;&#164;&#195;&#137;P&#195;&#133;&#195;&#137;V&#195;&#137;&#195;&#161;&#195;&#137;&#195;&#172;&#195;&#135;&#8224;&#195;&#135;'&#195;&#135;&#195;&#139;&#195;&#135;&#195;&#152;&#195;&#133;&#501;&#195;&#135;&#195;&#039;&#195;&#135;&#195;&#039;]">
    <param name="Name" value="Test File">
    <param name="Local" value="applications.htm#Xau1044547">

I tried playing around with the encoding and it now produces:

    <param name="Name" value="ÉAÉvÉäÉPÅ">
    <param name="Name" value="Test">
    <param name="Local" value="applications.htm#Xau1044547">

But with utf-8 encoding (another tool) and the correct output should be:

    <param name="Name" value="アプリケーション">
    <param name="Name" value="Small Business アプリケーションの起動 ">
    <param name="Local" value="applications1.html#wp1044548">

Is there any java API I can use to decode and encode the files to have the correct output. I am not sure what the tool is using but I am guessing its "ISO-8859-1".

Thanks.

Your problem is that you need to use two encodings correctly:

  • Find out what encoding your "Japanese content" uses
  • Make sure the tool correctly uses that encoding to read that content
  • Make sure the tool uses UTF-8 to encode the output file and correctly declares that in its header .

It would appear from the upper-most sample that your encoding at that point is already corrupt. The value for the first "Name" attribute it being represented with HTML character escape codes (decimal NCR).

That being said, the 2nd samples (value="ÉAÉvÉäÉPÅ") and 3rd samples (value="アプリケーション") do not match the 1st.

If HTML character escapes are indeed what the output should be, then the output encoding would be ASCII or some other variant, and the value would then be:

value="&#12450;&#12503;&#12522;&#12464;&#12540;&#12471;&#12519;&#12531;"

I think you would need to reconfirm how this 3rd party tool is outputting the XML.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM