简体   繁体   中英

Accented characters read from file has different value under Eclipse than console

I'm trying to match a dropdown option:

Cabina Económica

against a String imported from a properties file.

I was having problems using

"//a[text()='" + cabin + "']"

and so changed it to:

final String translateFrom = "ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜÉÈÊÀÁÂÒÓÔÙÚÛÇÅÏÕÑŒäöüéèêàáâòóôùúûçåïõñœ";
final String translateTo   = "abcdefghijklmnopqrstuvwxyzaoueeeaaaooouuucaionoaoueeeaaaooouuucaiono";
"//a[translate(text(),'"+translateFrom+"','"+translateTo+"')=translate('"+cabin+"', '"+translateFrom+"', '"+translateTo+"')]";

which works perfectly when I test it in Eclipse, but fails when I run it under the Windows 7 console:

main() Terminating due to error/exception: Unable to locate element: ....)=translate('Cabina Econ├│mica'....

If I print out the dropdown option from the page, under the Windows console it show as:

Cabina Econ≤mica

≤ seems to be ASCII F3, which matches what I see when I examine the (both) Strings under Eclipse.

But ├│, the value being read from the properties file, whilst it is F3 under Eclipse, seems to be C3B3 under the Windows console.

F3 is the Unicode value for ó; C3B3 is its UTF-8 value.

Why does reading the properties file under Eclipse (via Spring) give a different result to reading it under the Windows console, and what do I need to do to make these equal?

Update

The webpage I'm reading is defined with

<meta ... charset=utf-8>

so I assume that something (Selenium?) is translating it to utf-16 or utf-32 (where ó = x'f3') before I see it.

Whereas Spring's property file seems to being read as utf-8 under the console but 16/32 under Eclipse.

Update 2

Further research suggest this might be something to do with Spring's property file loading. I've opened a new question at:

https://stackoverflow.com/questions/35612302/spring-loads-property-files-differently-under-windows-console-than-under-eclipse

and think it best to delete this one (unless anyone objects?)

Check the encoding of the console in the preference of eclipse. It's probably not the same encoding used by the windows console.

Uncertain but possible answer with info:

Actually nothing above 7F is ASCII; a Windows console window (often inaccurately called 'DOS' prompt or window) uses the Windows 'OEM' (legacy) code page usually 437 in which F3 is the character . And the two characters ├│ are C3 B3 which you correctly identify as the UTF-8 for Unicode F3 ó . It is possible to fix the Windows console display by explicitly encoding to IBM437 , but you need to do this only for the console display and not elsewhere, including not Windows files because files use either the so-called 'ANSI' (really CP1252) single-byte code or one of several Unicode encodings (UTF-8 or UTF-16 in either endianness).

Java's default encoding for I/O (particularly but not only files) on Windows is CP1252, while on Unix it is often though not always UTF-8. Is your Eclipse on Unix? My Eclipse (Indigo) on Windows defaults CP1252 for plain Java, but I don't know if Spring does anything to override that. If it uses the default to read your file, you can set that default with system property file.encoding=utf-8 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM