简体   繁体   中英

Regular expression, match a partial word, C#

I'm trying to use a regular expression to match instances that have duplicate partial words in a file, where the word needs to match up until one instance has an underscore _Id, and the other instance is a partial match on the word.

something along the lines of:

The regex i'm using is: /^(\\S+) (?=(?s:.)*\\1.*).*

public Guid? Something_Id { get; set;}
public Guid? SomethingId {get; set;}

public Guid? AnotherProp_Id { get; set;}
public Guid? AnotherPropId { get; set; }

Where this should return 2 matching instances.

Almost make more sense to include a remove for all underscores, and then match for duplicates on a forward/backward lookup ?

Bit stuck on how to do that

Forword

Whilst it has been correctly highlighted via a comment that a RegEx might not be the best solution - this answer offers a RegEx solution the question asked for.

Regular Expression

(?:
  ^[^\r\n]+?\b(\S+)_Id\b[^\r\n]+.*?
  ^[^\r\n]+?\b(?:\1)Id\b
|
  ^[^\r\n]+?\b(\S+)Id\b[^\r\n]+.*?
  ^[^\r\n]+?\b(?:\2)_Id\b
)

https://regex101.com/r/iC9qK5/1

Visualisation

正则表达式可视化

Notes

It looks repeated because it can match in any order, either *Id then *_Id or *_Id then *Id .

This also allows anything to be on the lines between.

Code

try {
    Regex regexObj = new Regex(
        @"(?:
          ^[^\r\n]+?\b(\S+)_Id\b[^\r\n]+.*?
          ^[^\r\n]+?\b(?:\1)Id\b
        |
          ^[^\r\n]+?\b(\S+)Id\b[^\r\n]+.*?
          ^[^\r\n]+?\b(?:\2)_Id\b
        )", 
        RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline | RegexOptions.Multiline);
    Match matchResults = regexObj.Match(subjectString);
    while (matchResults.Success) {
        // matched text: matchResults.Value
        // match start: matchResults.Index
        // match length: matchResults.Length
        matchResults = matchResults.NextMatch();
    } 
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Part 2: Remove the offenders

If this does the job for you - you could perform a replacement with a slightly modified version

Regular Expression

(?:
  (^[^\r\n]+?\b(\S+)_Id\b[^\r\n]+(.*?))
  (^[^\r\n]+?\b(?:\2)Id\b)[^\r\n]+\r?\n
|
  (\r?\n^[^\r\n]+?\b(\S+)Id\b[^\r\n]+)(.*?)
  (^[^\r\n]+?\b(?:\6)_Id\b)[^\r\n]+
)

https://regex101.com/r/iC9qK5/2

Replacement

$1$7$8

Visualisation

正则表达式可视化

Code

string resultString = null;
try {
    resultString = Regex.Replace(subjectString, 
        @"(?:
          (^[^\r\n]+?\b(\S+)_Id\b[^\r\n]+(.*?))
          (^[^\r\n]+?\b(?:\2)Id\b)[^\r\n]+\r?\n
        |
          (\r?\n^[^\r\n]+?\b(\S+)Id\b[^\r\n]+)(.*?)
          (^[^\r\n]+?\b(?:\6)_Id\b)[^\r\n]+
        )", 
        "$1$7$8", RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline | RegexOptions.Multiline);
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM