简体   繁体   English

javascript 去除少于 3 个字符的单词

[英]javascript remove words less than 3 characters

I am tired to remove all the words less than 3 characters, like in, on,the... .我厌倦了删除所有少于 3 个字符的单词,例如in, on,the...

My code not work for me, Uncaught TypeError: Object... has no method 'replace' ask for a help.我的代码不适合我, Uncaught TypeError: Object... has no method 'replace'寻求帮助。

var str = 'Proin néc turpis eget dolor dictǔm lacínia. Nullam nǔnc magna, tincidunt eǔ porta in, faucibus sèd magna. Suspendisse laoreet ornare ullamcorper. Nulla in tortòr nibh. Pellentesque sèd est vitae odio vestibulum aliquet in nec leo.';
var newstr = str.split(" ").replace(/(\b(\w{1,3})\b(\s|$))/g,'');
alert(newstr);

You need to change the order of split and replace : 您需要更改splitreplace的顺序:

var newstr = str.replace(/(\b(\w{1,3})\b(\s|$))/g,'').split(" ");

Otherwise, you end up calling replace on an array, which does not have this method. 否则,您最终会在没有此方法的数组上调用replace

See it in action . 看到它在行动

Note: Your current regex does not correctly handle the case where a "short" word is immediately followed by a punctuation character. 注意:您当前的正则表达式无法正确处理“短”字后面紧跟着标点字符的情况。 You can change it slightly to do that: 您可以稍微更改它来执行此操作:

/(\b(\w{1,3})\b(\W|$))/g
                ^^

Apart from that, you also have to take care of the fact that the resulting array may contain empty strings (because deleting consecutive short words separated by spaces will end up leaving consecutive spaces in the string before it's split). 除此之外,您还必须注意这样一个事实:结果数组可能包含空字符串(因为删除由空格分隔的连续短字将最终在字符串分割前留下连续的空格)。 So you might also want to change how you split . 所以你可能也想改变你的split All of this gives us: 所有这些都给了我们:

var newstr = str.replace(/(\b(\w{1,3})\b(\W|$))/g,'').split(/\s+/);

See it in action . 看到它在行动

Update: As Ray Toal correctly points out in a comment, in JavaScript regexes \\w does not match non-ASCII characters (eg characters with accents). 更新:正如Ray Toal在评论中正确指出的那样,JavaScript正则表达式\\w与非ASCII字符(例如带重音的字符) 匹配。 This means that the above regexes will not work correctly (they will work correctly on certain other flavors of regex). 这意味着上面的正则表达式将无法正常工作(它们将在某些其他正则表达式上正常工作)。 Unfortunately, there is no convenient way around that and you will have to replace \\w with a character group such as [a-zA-Zéǔí] , and do the converse for \\W . 不幸的是,没有方便的方法,您将不得不用[a-zA-Zéǔí]等字符组替换\\w ,并为\\W做相反的操作。

Update: 更新:

Ugh, doing this in JavaScript regex is not easy. 呃,在JavaScript正则表达式中这样做并不容易。 I came up with this regex: 我想出了这个正则表达式:

([^ǔa-z\u00C0-\u017E]([ǔa-z\u00C0-\u017E]{1,3})(?=[^ǔa-z\u00C0-\u017E]|$))

...which I still don't like because I had to manually include the ǔ in there. ...我仍然不喜欢,因为我必须在那里手动包含ǔ

See it in action . 看到它在行动

Try this: 尝试这个:

str = str.split( ' ' ).filter(function ( str ) {
    var word = str.match(/(\w+)/);
    return word && word[0].length > 3;
}).join( ' ' );

Live demo: http://jsfiddle.net/sTfEs/1/ 现场演示: http //jsfiddle.net/sTfEs/1/

str.split(" ") returns an array, which does not have a replace method. str.split(" ")返回一个没有replace方法的数组。

Secondly, you probably don't use regexes for this. 其次,你可能不会使用正则表达式。 JavaScript does not have good support for non-ASCII letters in regexes. JavaScript对正则表达式中的非ASCII字母没有很好的支持。 See Regular expression to match non-English characters? 请参阅正则表达式以匹配非英语字符? . If you need to use a regex, there are hints in there. 如果你需要使用正则表达式,那里有提示。

And BTW, in all regex flavors, \\w{1,3} DOES NOT match "néc" As you probably know, \\w is [A-Za-z_] . 而BTW,在所有正则表达式中, \\w{1,3}"néc" 匹配您可能知道, \\w[A-Za-z_] See http://jsfiddle.net/3YWSC/ for an example. 有关示例,请参见http://jsfiddle.net/3YWSC/

Are you only trying to match words of non-spaces? 你只是想匹配非空格的话吗? Or are you looking to for words of three or less letters only? 或者您是否只想要三个或更少字母的单词? On the one hand you split across spaces, but on the other you used \\w . 一方面你跨越空间,但另一方面你使用\\w I would go with something like Dennis's answer. 我会选择丹尼斯的回答。

var words = str.split(" "); //Turns the string into an array of words
var longWords = []; //Initialize array
for(var i = 0; i<words.length; i++){
    if(words[i].length > 3) {
        longWords.push(words[i]);
    }
}
var newString = longWords.join(" "); //Create a new string of the words separated by spaces.

Using lodash with less then 20 chars: 使用少于20个字符的lodash:

let a = ['la','rivière','et','le','lapin','sont','dans','le','près'];

a = _.remove(_.uniq(a),n=>_.size(n)>3); // ['rivière','lapin','sont','dans','près']

Using The filter method使用过滤方法

 let sentence = "Proin néc turpis eget dolor dictǔm lacínia. Nullam nǔnc magna, tincidunt eǔ porta in, faucibus sèd magna. Suspendisse laoreet ornare ullamcorper. Nulla in tortòr nibh. Pellentesque sèd est vitae odio vestibulum aliquet in nec leo."; let sent = sentence.split(" ").filter((ele) => ele.length > 3).join(" "); console.log(sent);

尝试

var newstr = str.replace(/(\b(\w{1,3})\b(\s|$))/g,'').split(" ");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM