I'm parsing a file containing statements line by line. I want to:
A line is an assignment if it has one of the following two forms:
DataType Identifier = ...
Identifier = ...
The data type must be one of: "R", "L", "H", "X", "I". The data type is optional. Spaces are allowed in any position around the DataType and the Identifier. Example of lines containing statements:
L Input = ...
DigitalOutput = ...
R Output= ...
H AnalogInput=...
X Output = ...
Expected result after parsing the statements above would be:
L Deprecated = ...
DigitalOutput = ...
R Deprecated= ...
H AnalogInput=...
X Deprecated = ...
The file also contains other statements than assignments so its important to identify lines with assignments and only replace identifiers in that case. I've tried to use a regular expression with positive lookbehind and positive lookahead:
public void ReplaceIdentifiers(string line)
{
List<string> validDataTypes = new List<string>{"R", "L", "H", "X", "I"};
List<string> identifiersToReplace = new List<string>{"Input", "Output"};
string = ...
Regex regEx = new Regex(MyRegEx);
regEx.Replace(line, "Deprecated");
}
Where MyRegex is on the form (pseudo code):
$@"(?<=...){Any of the two identifiers to replace}(?=...)"
The lookbehind:
Start of string OR
Zero or more spaces, Any of the valid data types, Zero or more spaces OR
Zero or more spaces
The lookahead:
Zero or more spaces, =
I haven't managed to get the regular expression right. How do I write the regular expression?
Since .NET regex supports non-fixed length Lookbehind, you may use the following pattern:
(?<=^\s*(?:[RLHXI]\s+)?)(?:Input|Output)(?=\s*=)
And replace with Deprecated
.
C# example:
string input = "L Input = ...\n" +
"DigitalOutput = ...\n" +
" R Output= ...\n" +
"H AnalogInput=...\n" +
" X Output = ...\n" +
"IOutput = ...\n" +
"Output = ...";
Regex regEx = new Regex(@"(?<=^\s*(?:[RLHXI]\s+)?)(?:Input|Output)(?=\s*=)",
RegexOptions.Multiline);
string output = regEx.Replace(input, "Deprecated");
Console.WriteLine(output);
Output:
L Deprecated = ...
DigitalOutput = ...
R Deprecated= ...
H AnalogInput=...
X Deprecated = ...
IOutput = ...
Deprecated = ...
For the particular case shown, your regex can be:
^(\s*[RLHXI]\s+)(?:Output|Input)(\s*=)
replace with $1Deprecated$2
, with multiline option.
If both the type names and identifiers to replace are not available at compile time, you can use string.format
with this format:
^(\s*(?:{0})\s+)(?:{1})(\s*=)
The arguments you pass to it will be the lists of strings, joined with |
, using string.Join
:
string regex = string.Format(
@"^(\s*(?:{0})\s+)(?:{1})(\s*=)",
string.Join("|", validDataTypes), // you should probably escape these beforehand
string.Join("|", identifiersToReplace)
);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.