简体   繁体   中英

mb_detect_encoding doesn't properly working with Windows-1250 (CP1250)

I have problem with detecting CP1250 in mb_detect_encoding() , in my case I want detect 3 encodings:

mb_detect_encoding($string, 'UTF-8,ISO-8859-2,Windows-1250')

But Windows isn't in supported encodings, any solution?

mb_detect_encoding always "detects" single-byte encodings. You can read about this in the documentation for mb_detect_order :

mbstring currently implements the following encoding detection filters. If there is an invalid byte sequence for the following encodings, encoding detection will fail.

UTF-8, UTF-7, ASCII, EUC-JP,SJIS, eucJP-win, SJIS-win, JIS, ISO-2022-JP

For ISO-8859-X, mbstring always detects as ISO-8859-X.

For UTF-16, UTF-32, UCS2 and UCS4, encoding detection will fail always.

Conclusions:

  1. It's meaningless to ask for detection of ISO-8859-2; it will always tell you "yes, that's it" (unless of course it detects UTF-8 first).
  2. Windows-1250 is not supported, but even if it were it would work exactly like ISO-8859-2.

In general, it is impossible to detect single-byte encodings with accuracy. If you find yourself needing to do that in PHP you will need to do it manually; don't expect very good results.

It is not feasible to distinguish ISO-8859-2 from Windows-1250, or any other single-byte encoding from any other encoding for that matter. mb_detect_encoding simply gives you the first encoding which is valid for the given string, and both are equally valid. "Detecting" encodings is by definition not possible with any amount of accuracy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM