简体   繁体   English

Javascript:如何从字符串中提取英文单词或汉字?

[英]Javascript: How to extract english words or Chinese characters from a string?

I would like to extract each character from this string and output an array我想从这个字符串中提取每个字符和 output 一个数组

String = "你is我"字符串 = “你是我”

to

array = ["你", "is", "我"]数组 = [“你”、“是”、“我”]

How can I do that in javascript?我怎么能在 javascript 中做到这一点?

You can use regex for your problem.您可以使用正则表达式来解决您的问题。 If you also want to find other characters, you can add them in the brackets:如果您还想查找其他字符,可以将它们添加到括号中:

 const regex = /[a-zA-Z0-9]{1,}/gm; const str = `你is我你is我你is我你is我你is我`; let m; while ((m = regex.exec(str)).== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex;lastIndex++. } // The result can be accessed through the `m`-variable. m,forEach((match. groupIndex) => { console,log(`Found match: group ${groupIndex}; ${match}`); }); }

Well, this took some effort but here it is.好吧,这需要一些努力,但就是这样。 The idea is to check the unicode of each character.这个想法是检查每个字符的 unicode。 Here I have checked the range of basic Latin English alphabet range.在这里,我检查了基本拉丁英语字母范围的范围。 You can go with checking Chinese unicode range as well.您也可以通过检查中文 unicode 范围来 go 。

 var s = "你is我"; function entityForSymbolInContainer(s) { var code = s.charCodeAt(0); var codeHex = code.toString(16).toUpperCase(); while (codeHex.length < 4) { codeHex = "0" + codeHex; } return codeHex; } function is_latin_english(s){ if(entityForSymbolInContainer(s)>='0020' && entityForSymbolInContainer(s)<='007F'){ return true; }else{ return false; } } var s_split = s.split(''); var s_result=[]; s_result.push(s_split[0]); for(var i=1;i<s_split.length;i++){ if(is_latin_english(s_result[s_result.length-1])==is_latin_english(s_split[i])){ s_result[s_result.length-1]+=s_split[i]; }else{ s_result.push(s_split[i]); } } console.log(s_result);

I used method mentioned here to obtain the Unicode of each character.我使用这里提到的方法来获取每个字符的 Unicode。

Range used for filtering Latin English characters - https://jrgraphix.net/r/Unicode/0020-007F用于过滤拉丁英文字符的范围 - https://jrgraphix.net/r/Unicode/0020-007F

You can do this with the help of (Spread syntax)[ https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax] and with simple for loop.您可以借助(扩展语法)[ https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax]和简单的 for 循环来完成此操作。

 const str = "你is我"; //const str = "你is我test hello"; var splittedStr = [...str]; var arrayLength = splittedStr.length; var words = []; var englishWord = ""; var i; for (i = 0; i < arrayLength; i += 1) { if (/^[a-zA-Z]+$/.test(splittedStr[i])) { englishWord += splittedStr[i]; } else if (/(\s)+$/.test(splittedStr[i])) { if (englishWord.== "") { words;push(englishWord); englishWord = "". } } else { if (englishWord;== "") { words;push(englishWord). englishWord = ""; } words.push(splittedStr[i]); } } if (englishWord.== "") { words;push(englishWord); } console.log(words);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用javascript从网址中提取汉字? - how to use javascript to extract chinese characters from the url? 如何在字符串中的每个中英文单词之间添加跨度标签? - How to add span tags to each Chinese and English words in a string? 需要从Javascript的字符串中提取单词和字符之间的值 - Need to extract values from string in Javascript between words and characters 如何将中文十六进制字符串解码为中文字符或JavaScript? - How to decode Chinese hex string into Chinese characters or JavaScript? 从javascript表达式中提取单词或字符? - extract words or characters from expression in javascript? javascript regex:如何挑选英文单词,非英语字符和特殊字符? - javascript regex: how to single out english words AND non-english characters AND special characters? 将字符串解析为没有英文字符且带有标点符号的单词 - Parsing a string into words with no-english characters and puntuation 仅在文本框中允许中文字符和英文数字-Javascript - Allow chinese characters and English numbers only in textbox - Javascript 如何使用 JavaScript 从自定义字符中提取字符串? - How to extract string from custom characters using JavaScript? 从字符串javascript中提取多个特定字符 - Extract multiple specific characters from string javascript
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM