简体   繁体   中英

Match regex pattern in a line of text without targeting the text within quotations

Stackoverflow has been very generous with answers to my regex questions so far, but with this one I'm blanking on what to do and just can't seem to find it answered here.

So I'm parsing a string, let's say for example's sake, a line of VB-esque code like either of the following:

 Call     Function  (    "Str ing 1   ", "String 2"    , "   String    3  ", 1000    )    As   Integer
      Dim    x   = "This    string  should not be affected    "

I'm trying to parse the text in order to eliminate all leading spaces, trailing spaces, and extra internal spaces (when two "words/chunks" are separated with two or more space or when there is one or more spaces between a character and a parentheses) using regex in C#. The result after parsing the above should look like:

Call Function("Str ing 1   ", "String 2", "   String    3  ", 1000) As Integer
Dim x = "This    string  should not be affected    "

The issue I'm running into is that, I want to parse all of the line except any text contained within quotation marks (ie a string). Basically if there are extra spaces or whatever inside a string, I want to assume that it was intended and move on without changing the string at all, but if there are extra spaces in the line text outside of the quotation marks, I want to parse and adjust that accordingly.

So far I have the following regex which does all of the parsing I mentioned above, the only issue is it will affect the contents of strings just like any other part of the line:

    var rx = new Regex(@"\A\s+|(?<=\s)\s+|(?<=.)\s+(?=\()|(?<=\()\s+(?=.)|(?<=.)\s+(?=\))|\s+\z")
    .
    .
    .
    lineOfText = rx.Replace(lineOfText, String.Empty);

Anyone have any idea how I can approach this, or know of a past question answering this that I couldn't find? Thank you!

Since you are reading the file line by line, you can use the following fix:

("[^"]*(?:""[^"]*)*")|^\s+|(?<=\s)\s+|(?<=\w)\s+(?=\()|(?<=\()\s+(?=\w)|(?<=\w)\s+(?=\))|\s+$

Replace the matched text with $1 to restore the captured string literals that were captured with ("[^"]*(?:""[^"]*)*") .

See demo

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM