简体   繁体   English

Javascript查找是否仅英文字母

[英]Javascript find if english alphabets only

Am trying to find some text only if it contains english letters and numbers using Javascript/jQuery.我正在尝试使用 Javascript/jQuery 查找包含英文字母和数字的文本。

Am wondering what is the most efficient way to do this?想知道最有效的方法是什么? Since there could be thousands of words, it should be as fast as possible and I don't want to use regex.由于可能有数千个单词,因此应该尽可能快,而且我不想使用正则表达式。

 var names[0] = 'test';
 var names[1] = 'हिन';
 var names[2] = 'لعربية';

 for (i=0;i<names.length;i++) {
    if (names[i] == ENGLISHMATCHCODEHERE) {
        // do something here
    }
 }

Thank you for your time.感谢您的时间。

A regular expression for this might be:一个正则表达式可能是:

var english = /^[A-Za-z0-9]*$/;

Now, I don't know whether you'll want to include spaces and stuff like that;现在,我不知道您是否想要包含空格之类的东西; the regular expression could be expanded.正则表达式可以扩展。 You'd use it like this:你会像这样使用它:

if (english.test(names[i])) // ...

Also see this: Regular expression to match non-English characters?另请参阅: Regular expression to match non-English characters?

edit my brain filtered out the "I don't want to use a regex" because it failed the "isSilly()" test.编辑我的大脑过滤掉了“我不想使用正则表达式”,因为它没有通过“isSilly()”测试。 You could always check the character code of each letter in the word, but that's going to be slower (maybe much slower) than letting the regex matcher work.您可以随时在单词的每个字母的字符代码,但是这将是慢于让正则表达式匹配的工作(也许慢)。 The built-in regular expression engine is really fast.内置的正则表达式引擎非常快。

When you're worried about performance, always do some simple tests first before making assumptions about the technology (unless you've got intimate knowledge of the technology already).当您担心性能时,总是先做一些简单的测试,然后再对技术做出假设(除非您已经对技术有了深入的了解)。

If you're dead set against using regexes, you could do something like this:如果您坚决反对使用正则表达式,则可以执行以下操作:

// Whatever valid characters you want here
var ENGLISH = {};
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".split("").forEach(function(ch) {
    ENGLISH[ch] = true;
});

function stringIsEnglish(str) {
    var index;

    for (index = str.length - 1; index >= 0; --index) {
        if (!ENGLISH[str.substring(index, index + 1)]) {
            return false;
        }
    }
    return true;
}

Live Example:现场示例:

 // Whatever valid characters you want here var ENGLISH = {}; "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".split("").forEach(function(ch) { ENGLISH[ch] = true; }); function stringIsEnglish(str) { var index; for (index = str.length - 1; index >= 0; --index) { if (!ENGLISH[str.substring(index, index + 1)]) { return false; } } return true; } console.log("valid", stringIsEnglish("valid")); console.log("invalid", stringIsEnglish("invalid!"));

...but a regex ( /^[a-z0-9]*$/i.test(str) ) would almost certainly be faster. ...但是正则表达式( /^[a-z0-9]*$/i.test(str) )几乎肯定会更快。 It is in this synthetic benchmark , but those are often unreliable.这个综合基准测试中,但那些通常是不可靠的。

Iterate each character in the string and check if the key code is not between 65 and 122 , which are the latin alphabet, lowercase and uppercase.迭代字符串中的每个字符并检查键码是否不在65122之间,即拉丁字母、小写和大写。

If wished to add punctuations characters, add their keyCode to the check.如果希望添加标点符号,请将其keyCode添加到检查中。

 function isLatinString(s) { var i, charCode; for (i = s.length; i--;) { charCode = s.charCodeAt(i) if (charCode < 65 || charCode > 122) return charCode } return true } // tests [ "abxSDSzfgr", "aAzZ123dsfsdfעחלעלחי", "abc!", "$abc", "123abc", " abc" ] .forEach(s => console.log( isLatinString(s), s ))

Another way, using an explicit whitelist string to allow specific charatcers:另一种方式,使用显式白名单字符串来允许特定字符:

 function isLatinString(s){ var c, whietlist = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" for( c in s ) // character in string // if whitelist string doesn't include the character, break if( !whietlist.includes(s[c]) ) return false return true } // tests [ "abxSDSzfgr", "aAzZ123dsfsdfעחלעלחי", "abc!", "$abc", "123abc", " abc" ] .forEach(s => console.log( isLatinString(s), s ))

Using regex is the fastest way to do this I'm afraid.恐怕使用正则表达式是最快的方法。 This to my knowledge should be the fastest algorithm:据我所知,这应该是最快的算法:

var names = 'test',
var names[1] = 'हिन';
var names[2] = 'لعربية';

//algorithm follows
var r = /^[a-zA-Z0-9]+$/,
    i = names.length;

while (--i) {
    if (r.test(names[i])) {
        // do something here
    }
}

You should consider words that may contain special characters.您应该考虑可能包含特殊字符的单词。 For example {it's}, isn't it english?例如{it's},不是英文吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM