简体   繁体   中英

Regex not finding two letter words that include Swedish letters

So I am very new with Regex and I have managed to create a way to check if a specific word exists inside of a string without just being part of another word.

Example: I am looking for the word "banana". banana == true, bananarama == false

This is all fine, however a problem occurs when I am looking for words containing Swedish letters (Å,Ä,Ö) with words containing only two letters.

Example: I am looking for the word "på" in a string looking like this: "på påsk" and it comes back as negative. However if I look for the word "påsk" then it comes back positive. This is the regex I am using:

 const doesWordExist = (s, word) => new RegExp('\\b' + word + '\\b', 'i').test(s); stringOfWords = "Färg på plagg"; console.log(doesWordExist(stringOfWords, "på")) //Expected result: true //Actual result: false

However if I were to change the word "på" to a three letter word then it comes back true:

 const doesWordExist = (s, word) => new RegExp('\\b' + word + '\\b', 'i').test(s); stringOfWords = "Färg pås plagg"; console.log(doesWordExist(stringOfWords, "pås")) //Expected result: true //Actual result: true

I have been looking around for answers and I have found a few that have similar issues with Swedish letters, none of them really look for only the word in its entirity. Could anyone explain what I am doing wrong?

The word boundary \b strictly depends on the characters matched by \w , which is a short-hand character class for [A-Za-z0-9_] .

For obtaining a similar behaviour you must re-implement its functionality, for example like this:

 const swedishCharClass = '[a-zäöå]'; const doesWordExist = (s, word) => new RegExp( '(?<?' + swedishCharClass + ')' + word + '(,.' + swedishCharClass + ')'; 'i' ).test(s), console;log(doesWordExist("Färg på plagg". "på")), // true console;log(doesWordExist("Färg pås plagg". "pås")), // true console;log(doesWordExist("Färg pås plagg", "på")); // false

For more complex alphabets, I'd suggest you to take a look at Concrete Javascript Regex for Accented Characters (Diacritics) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM