简体   繁体   中英

using regex to split equations with variables C#

I've been struggling with this for quite awhile (not being a regex ninja), searching stackoverflow and through trial an error. I think I'm close, but there are still a few hiccups that I need help sorting out.

The requirements are such that a given equation, that includes variables, exponents, etc, are split by the regex pattern after variables, constants, values, etc. What I have so far

     Regex re = new Regex(@"(\,|\(|\)|(-?\d*\.?\d+e[+-]?\d+)|\+|\-|\*|\^)");
     var tokens = re.Split(equation)

So an equation such as

    2.75423E-19* (var1-5)^(1.17)* (var2)^(1.86)* (var3)^(3.56)

should parse to

     [2.75423E-19 ,*, (, var1,-,5, ), ^,(,1.17,),*....,3.56,)]

However the exponent portion is getting split as well which I think is due to the regex portion: |+|-.

Other renditions I've tried are:

    Regex re1 = new Regex(@"([\,\+\-\*\(\)\^\/\ ])"); and 
    Regex re = new Regex(@"(-?\d*\.?\d+e[+-]?\d+)|([\,\+\-\*\(\)\^\/\ ])");

which both have there flaws. Any help would be appreciated.

For the equations like the one posted in the original question, you can use

[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?|[-^+*/()]|\w+

See regex demo

The regex matches:

  • [0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)? - a float number
  • | - or...
  • [-^+*/()] - any of the arithmetic and logical operators present in the equation posted
  • | - or...
  • \\w+ - 1 or more word characters (letters, digits or underscore).

For more complex tokenization, consider using NCalc suggested by Lucas Trzesniewski 's comment .

C# sample code :

var line = "2.75423E-19* (var1-5)^(1.17)* (var2)^(1.86)* (var3)^(3.56)";
var matches = Regex.Matches(line, @"[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?|[-^+*/()]|\w+");
foreach (Match m in matches)
    Console.WriteLine(m.Value);

And updated code for you to show that Regex.Split is not necessary here:

var result = Regex.Matches(line, @"\d+(?:[,.]\d+)*(?:e[-+]?\d+)?|[-^+*/()]|\w+", RegexOptions.IgnoreCase)
             .Cast<Match>()
             .Select(p => p.Value)
             .ToList();

Also, to match formatted numbers, you can use \\d+(?:[,.]\\d+)* rather than [0-9]*\\.?[0-9]+ or \\d+(,\\d+)* .

So I think I've got a solution thanks to @stribizhev solution lead me to the regex solution

            Regex re = new Regex(@"(\d+(,\d+)*(?:.\d+)?(?:[eE][-+]?[0-9]+)?|[-^+/()]|\w+)");
            tokenList = re.Split(InfixExpression).Select(t => t.Trim()).Where(t => t != "").ToList();  

When split gives me the desired array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM