简体   繁体   中英

Find and replace identifier with regular expression

I'm parsing a file containing statements line by line. I want to:

  1. Identify all lines containing assignments.
  2. Replace identifiers of certain types (Input and Output).

A line is an assignment if it has one of the following two forms:

DataType Identifier = ...
Identifier = ...

The data type must be one of: "R", "L", "H", "X", "I". The data type is optional. Spaces are allowed in any position around the DataType and the Identifier. Example of lines containing statements:

L Input = ...
DigitalOutput = ...
  R Output= ...
H AnalogInput=...
  X Output   = ...

Expected result after parsing the statements above would be:

L Deprecated = ...
DigitalOutput = ...
  R Deprecated= ...
H AnalogInput=...
  X Deprecated   = ...

The file also contains other statements than assignments so its important to identify lines with assignments and only replace identifiers in that case. I've tried to use a regular expression with positive lookbehind and positive lookahead:

public void ReplaceIdentifiers(string line)
{
  List<string> validDataTypes = new List<string>{"R", "L", "H", "X", "I"};
  List<string> identifiersToReplace = new List<string>{"Input", "Output"};
  string = ...
  Regex regEx = new Regex(MyRegEx);
  regEx.Replace(line, "Deprecated");
}

Where MyRegex is on the form (pseudo code):

$@"(?<=...){Any of the two identifiers to replace}(?=...)"

The lookbehind:

Start of string OR 
Zero or more spaces, Any of the valid data types, Zero or more spaces OR
Zero or more spaces

The lookahead:

Zero or more spaces, =

I haven't managed to get the regular expression right. How do I write the regular expression?

Since .NET regex supports non-fixed length Lookbehind, you may use the following pattern:

(?<=^\s*(?:[RLHXI]\s+)?)(?:Input|Output)(?=\s*=)

And replace with Deprecated .

Regex demo .

C# example:

string input = "L Input = ...\n" +
               "DigitalOutput = ...\n" + 
               "  R Output= ...\n" + 
               "H AnalogInput=...\n" + 
               "  X Output   = ...\n" + 
               "IOutput = ...\n" + 
               "Output = ...";

Regex regEx = new Regex(@"(?<=^\s*(?:[RLHXI]\s+)?)(?:Input|Output)(?=\s*=)", 
                        RegexOptions.Multiline);
string output = regEx.Replace(input, "Deprecated");
Console.WriteLine(output);

Output:

L Deprecated = ...
DigitalOutput = ...
  R Deprecated= ...
H AnalogInput=...
  X Deprecated   = ...
IOutput = ...
Deprecated = ...

Try it online .

For the particular case shown, your regex can be:

^(\s*[RLHXI]\s+)(?:Output|Input)(\s*=)

replace with $1Deprecated$2 , with multiline option.

If both the type names and identifiers to replace are not available at compile time, you can use string.format with this format:

^(\s*(?:{0})\s+)(?:{1})(\s*=)

The arguments you pass to it will be the lists of strings, joined with | , using string.Join :

string regex = string.Format(
    @"^(\s*(?:{0})\s+)(?:{1})(\s*=)",
    string.Join("|", validDataTypes), // you should probably escape these beforehand
    string.Join("|", identifiersToReplace)
    );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM