简体   繁体   中英

Regex to get everything starting with @ and removing everything after any non-included characters

I have the following:

        Regex RgxUrl = new Regex("[^a-zA-Z0-9-_]");
        foreach (var item in source.Split(' ').Where(s => s.StartsWith("@")))
        {
            var mention = item.Replace("@", "");
            mention = RgxUrl.Replace(mention, "");
            usernames.Add(mention);
        }

CURRENT INPUT > OUTPUT

  • @fish and fries are @good > fish , good
  • @fish and fries and @Mary's beer are @good > fish , good , marys

DESIRED INPUT > OUTPUT

  • @fish and fries are @good > fish , good
  • @fish and fries and @Mary's beer are @good > fish , good , Mary

The key here is to remove anything that's after an offending character. How can this be achieved?

You split a string with a space, check if a chunk starts with @ , then if yes, remove all the @ symbols in the string, then use a regex to remove all non-alphanumeric, - and _ chars in the string and then add it to the list.

You can do that with a single regex:

var res = Regex.Matches(source, @"(?<!\S)@([a-zA-Z0-9-_]+)")
    .Cast<Match>()
    .Select(m=>m.Groups[1].Value)
    .ToList();
Console.WriteLine(string.Join("; ", res)); // demo
usernames.AddRange(res); // in your code

See the C# demo

Pattern details :

  • (?<!\\S) - there must not be a non-whitespace symbol immediately to the left of the current location (ie there must be a whitespace or start of string) (this lookbehind is here because the original code split the string with whitespace)
  • @ - a @ symbol (it is not part of the subsequent group because this symbol was removed in the original code)
  • ([a-zA-Z0-9-_]+) - Capturing Group 1 (accessed with m.Groups[1].Value ) matching one or more ASCII letters, digits, - and _ symbols.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM