[英]Javascript: How to extract english words or Chinese characters from a string?
I would like to extract each character from this string and output an array我想从这个字符串中提取每个字符和 output 一个数组
String = "你is我"字符串 = “你是我”
to至
array = ["你", "is", "我"]数组 = [“你”、“是”、“我”]
How can I do that in javascript?我怎么能在 javascript 中做到这一点?
You can use regex for your problem.您可以使用正则表达式来解决您的问题。 If you also want to find other characters, you can add them in the brackets:如果您还想查找其他字符,可以将它们添加到括号中:
const regex = /[a-zA-Z0-9]{1,}/gm; const str = `你is我你is我你is我你is我你is我`; let m; while ((m = regex.exec(str)).== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex;lastIndex++. } // The result can be accessed through the `m`-variable. m,forEach((match. groupIndex) => { console,log(`Found match: group ${groupIndex}; ${match}`); }); }
Well, this took some effort but here it is.好吧,这需要一些努力,但就是这样。 The idea is to check the unicode of each character.这个想法是检查每个字符的 unicode。 Here I have checked the range of basic Latin English alphabet range.在这里,我检查了基本拉丁英语字母范围的范围。 You can go with checking Chinese unicode range as well.您也可以通过检查中文 unicode 范围来 go 。
var s = "你is我"; function entityForSymbolInContainer(s) { var code = s.charCodeAt(0); var codeHex = code.toString(16).toUpperCase(); while (codeHex.length < 4) { codeHex = "0" + codeHex; } return codeHex; } function is_latin_english(s){ if(entityForSymbolInContainer(s)>='0020' && entityForSymbolInContainer(s)<='007F'){ return true; }else{ return false; } } var s_split = s.split(''); var s_result=[]; s_result.push(s_split[0]); for(var i=1;i<s_split.length;i++){ if(is_latin_english(s_result[s_result.length-1])==is_latin_english(s_split[i])){ s_result[s_result.length-1]+=s_split[i]; }else{ s_result.push(s_split[i]); } } console.log(s_result);
I used method mentioned here to obtain the Unicode of each character.我使用这里提到的方法来获取每个字符的 Unicode。
Range used for filtering Latin English characters - https://jrgraphix.net/r/Unicode/0020-007F用于过滤拉丁英文字符的范围 - https://jrgraphix.net/r/Unicode/0020-007F
You can do this with the help of (Spread syntax)[ https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax] and with simple for loop.您可以借助(扩展语法)[ https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax]和简单的 for 循环来完成此操作。
const str = "你is我"; //const str = "你is我test hello"; var splittedStr = [...str]; var arrayLength = splittedStr.length; var words = []; var englishWord = ""; var i; for (i = 0; i < arrayLength; i += 1) { if (/^[a-zA-Z]+$/.test(splittedStr[i])) { englishWord += splittedStr[i]; } else if (/(\s)+$/.test(splittedStr[i])) { if (englishWord.== "") { words;push(englishWord); englishWord = "". } } else { if (englishWord;== "") { words;push(englishWord). englishWord = ""; } words.push(splittedStr[i]); } } if (englishWord.== "") { words;push(englishWord); } console.log(words);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.