简体   繁体   中英

Regex to specifically match @first.last format and nothing before or after

I have a string in the format some words @first.last more words @first.last . I want to write a regex to pick out any substring in the format @first.last so that I can replace that substring with something else. This regex should only consider the @first.last substring and ignore any characters preceding the @ symbol or anything after the first space after last including that space. Ex:

regex = new RegExp(/[^\[\s](@[a-zA-Z0-9\.\-]+)/im);
str = 'Hey @first.last tell [@first.last] to check this out';
str = str.replace(regex, 'Keanu');
/** str: 'Hey Keanu tell [@first.last] to check this out? **?

Regexs I have tried:

  • (@[a-zA-Z0-9\\.\\-]+) -> Will get me part of the way there but doesn't get rid of characters before @ symbol

  • [^\\[](@[a-zA-Z0-9\\.\\-]+) -> This regex fails if @first.last is the first substring of the string ie. @first.last look at this would not be changed by a str.replace call

  • [^\\[\\s](@[a-zA-Z0-9\\.\\-]+) -> to try and filter out leading spaces

  • [^.+?](@[a-zA-Z0-9\\.\\-]+)

The main thing tripping me up is what to include before the (@[a-zA-Z0-9\\.\\-]+) to make sure that I only detect the @ symbol and the characters that immediately follow it in the first.last format.

I appreciate any help and assistance.

Using a character class like [a-zA-Z0-9.-]+ is a bit of a broad match as it does not guarantee for example that the dot is not at the end. It can match any of the listed, so for example --.-- is also valid. Note that you don't have to escape the dot and also not the dash if it is at the end.

  1. The first pattern (@[a-zA-Z0-9\\.\\-]+) matches both because there are no boundaries set on the left and the right.

  2. The second pattern [^\\[](@[a-zA-Z0-9\\.\\-]+) matches including the leading space as that is matched by the negated character class [^\\[] which matches not a [

  3. The third pattern [^\\[\\s](@[a-zA-Z0-9\\.\\-]+) does not match because now the negated character class [^\\[\\s] does not allow to match the leading space.

  4. The fourth pattern [^.+?](@[a-zA-Z0-9\\.\\-]+) matches the leading [ because that is matched by [^.+?] which matches not a . , + or ?

You could use a capturing group where the group can match either the start of the string or a whitespace char followed by matching the @ part with word chars and a dot:

(^|\s)@\w+\.\w+(?!\S)

Explanation

  • (^|\\s) Capturing group 1, start of string or whitespace char
  • @\\w+\\.\\w+ Match @ , then 1+ word chars, a dot and 1+ word chars (instead of \\w you could also use [a-zA-Z0-9]
  • (?!\\S) Assert that what is directly on the right is not a non whitespace char

In the replacement use the first capturing group follewed by your replacement $1Keanu

Regex demo

 regex = /(^|\\s)@\\w+\\.\\w+(?!\\S)/g; str = 'Hey @first.last tell [@first.last] to check this out'; str = str.replace(regex, "$1Keanu"); console.log(str); 

My guess is that maybe this expression might also work here,

(\s+|^)(@[a-z0-9-]+\.[a-z0-9-]+)(\s+|$)

\\s+ is a just in case for additional spaces and we would simply modify the char class, if we wish so.

 const regex = /(\\s+|^)(@[a-z0-9-]+\\.[a-z0-9-]+)(\\s+|$)/gis; const str = `some words @first.last more words @first.last some words @first.last more words @first.last012 some other text some words @first.last more words @first.last012\\$%^& some words@first.last more words@first.last`; const subst = `$1Keanu$3`; const result = str.replace(regex, subst); console.log(result); 

Please see the demo for explanation and more info.

RegEx Circuit

jex.im visualizes regular expressions:

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM