简体   繁体   中英

How to check if a string has any non ISO-8859-1 characters with Javascript?

I want to write a string validator (or regex) for ISO-8859-1 characters in Javascript.

If a string has any non ISO-8859-1 character, then validator must return false otherwise true . Eg:

str = "abcÂÃ";
validator(str); // should return true;

str = "a 你 好";
validator(str); // should return false;

str ="你 好";
validator(str); // should return false;

I have tried to use the following regex but it's not working perfectly.

var regex = /^[\u0000-\u00ff]+/g;
var res = regex.test(value);

Since you want to return false if any non-ISO-8859-1 character is present, you could use double-negate:

 var str = "abcÂÃ"; console.log(validator(str)); // should return true; str = "a 你 好"; console.log(validator(str)); // should return false; str = "你 好"; console.log(validator(str)); // should return false; str = "abc"; console.log(validator(str)); // should return true; str = "╗"; console.log(validator(str)); // should return false; function validator(str) { return !/[^\-\ÿ]/g.test(str); } 

It uses !/[^\-\ÿ]/g.test(str) , since it checks if there is any non-character, and if it has not, it returns true , otherwise, it returns false .

Just in case you want to have an alternative way...

ISO-8859-1 - for the Unicode block also called "Latin 1" https://en.wikipedia.org/wiki/ISO/IEC_8859-1

So, let try use some native function, that uses latin1 only input...

Base64, by design, expects binary data as its input. In terms of JavaScript strings, this means strings in which each character occupies only one byte. So if you pass a string into btoa() containing characters that occupy more than one byte, you will get an error, because this is not considered binary data https://developer.mozilla.org/en-US/docs/Web/API/btoa

const validator = (str) => {
  try {
    btoa(str)
    return true;
  } catch () {
    return false;
  }
}

btoa will throw following error:

Uncaught DOMException: Failed to execute 'btoa' on 'Window': The string to be encoded contains characters outside of the Latin1 range. at:1:1

See also: JavaScript has a Unicode problem

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM