简体   繁体   中英

Regex (C#) - how to match variable names that start with a colon

I need to distinguish variable names and non variable names in some expressions I am trying to parse. Variable names start with a colon, can have (but not begin with) numbers, and have underscores. So valid variable names are:

:x :_x :x2 :alpha_x   // etc

Then I have to pick out other words in the expression that don't begin with colons. So in the following expression:

:result = median(:x,:y,:z)

The variables would be :result, :x, :y, and :z while the other non-variable word would be median.

My regex to pick out the variable names is (this works):

:[a-zA-Z_]{1}[a-zA-Z0-9_]*

But I cannot figure out how to get the non-variable words. My regex for that is:

(?<!:)([a-zA-Z_]{1}[a-zA-Z0-9_]*)

The issue is, the match is only excluding the first character after the : like so:

在此输入图像描述

The following pattern seems to work:

(?<=[^A-Za-z0-9_:])[a-zA-Z_]{1}[a-zA-Z0-9_]*

The lookbehind (?<=[^A-Za-z0-9_:]) asserts that what precedes is neither a character allowed in the variable name or a colon. This would then mark the start of a non variable word.

Demo

The (?<!:)([a-zA-Z_]{1}[a-zA-Z0-9_]*) regex still matches partial variable words because (?<!:) assures there is no : immediately to the left of the current location, and then matches an identifier without checking for a word boundary. So, in :alpha , lpha is matched because l is preceded with a char other than : .

Hence the problem is easy to solve by adding a word boundary before [a-zA-Z_] :

var words = Regex.Matches(s, @"(?<!:)\b[a-zA-Z_]\w*", RegexOptions.ECMAScript)
        .Cast<Match>()
        .Select(x => x.Value)
        .ToList();

See the regex demo . Note you do not need to wrap the whole pattern with a capturing group.

Pattern details

  • (?<!:) - make sure there is no : immediately to the left of the current location
  • \\b - a word boundary: make sure there are no letters, digits or _ immediately to the left of the current location
  • [a-zA-Z_] - match an ASCII letter or _
  • \\w* - 0+ ASCII letters, digits or _ ( must be used with the ECMAScript option to only match ASCII letters and digits and make word boundary handle ASCII only)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM