简体   繁体   中英

Use Regex To Extract Line Of Text Excluding Lines With Certain Phrases

I have a problem I've been struggling with for the past couple of hours. I couldn't figure it out even after looking at similar posts on stackoverflow and researching so I'm just going to post it here and I'm sure someone can figure it out in two seconds...

Here is the sample text:

 1) IF045196B LOREM-IPSEM,DOLOR1          G35311          12/07/2018  09/07/1985   FNL  91452SB=;*      TRANS TO HOLD ORDER
 2) IF045197B LOREM-IPSEM,DOLOR1          G35311          12/07/2018  09/07/1985   FNL  91377SB=;*      ALTERNATE LAB DRAW
 3) IF044770B LOREM-IPSEM,DOLOR1          G35311          09/26/2018  09/07/1985        3020SBX=;R      RANDOM TEXT
  RANDOM TEXT;*    LOREM IPSEM
 4) IF044445B LOREM-IPSEM,DOLOR16         G35311          07/18/2018  09/07/1985        3020SBX=;R      RANDOM TEXT
  RANDOM TEXT;*    LOREM IPSEM
 5) IF044446B LOREM-IPSEM,DOLOR17         G35311          07/18/2018  09/07/1985        10165SB=;S/R    MOVIE TITLE
  3020SBX=;R    RANDOM TEXT         RANDOM TEXT;*    LOREM IPSEM
 6) IF044447B LOREM-IPSEM,DOLOR18         G35311          07/18/2018  09/07/1985        10256SB=;S/R    MOVIE TITLE
  3020SBX=;R    RANDOM TEXT         RANDOM TEXT;*    LOREM IPSEM
 7) IF044449B LOREM-IPSEM,DOLOR19         G35311          07/18/2018  09/07/1985        10256SB=;S/R    MOVIE TITLE
  3020SBX=;R    RANDOM TEXT         RANDOM TEXT;*    LOREM IPSEM

Lines 1 and 2 are not matches because they say "TRANSFER TO HOLD ORDER" and "ALTERNATE LAB DRAW".

I need a Regular Expression that will return lines 3, 4, 5, 6, and 7 back to me. I need the entire line back and then I am going to manipulate those strings later in my program.

So just to be clear, I should receive 5 matches back.

3) IF044770B LOREM-IPSEM,DOLOR1          G35311          09/26/2018  09/07/1985        3020SBX=;R      RANDOM TEXTRANDOM TEXT;*    LOREM IPSEM
4) IF044445B LOREM-IPSEM,DOLOR16         G35311          07/18/2018  09/07/1985        3020SBX=;R      RANDOM TEXTRANDOM TEXT;*    LOREM IPSEM
5) IF044446B LOREM-IPSEM,DOLOR17         G35311          07/18/2018  09/07/1985        10165SB=;S/R    MOVIE TITLE 3020SBX=;R    RANDOM TEXT         RANDOM TEXT;*    LOREM IPSEM
6) IF044447B LOREM-IPSEM,DOLOR18         G35311          07/18/2018  09/07/1985        10256SB=;S/R    MOVIE TITLE 3020SBX=;R    RANDOM TEXT         RANDOM TEXT;*    LOREM IPSEM
7) IF044449B LOREM-IPSEM,DOLOR19         G35311          07/18/2018  09/07/1985        10256SB=;S/R    MOVIE TITLE 3020SBX=;R    RANDOM TEXT         RANDOM TEXT;*    LOREM IPSEM

I think the answer is going to involve some sort of negative look ahead/behind in combination with the below regex.

(?<=\s+\d+\)\s+).*

Here's the permalink if you want to test it out. Regex Permalink .

This regex matches the entire line excluding the number at the beginning such as "1)" or "2)". Now I just need the regex to do a negative look ahead and exclude lines 1 and 2 since they have "TRANSFER TO HOLD ORDER" and "ALTERNATE LAB DRAW" in them.

Your help would be greatly appreciated!

Thank you,

Mark S.

You could use a negative lookahead to assert that the string does not end with TRANS TO HOLD ORDER or ALTERNATE LAB DRAW.

^(?!.*(?:TRANS TO HOLD ORDER|ALTERNATE LAB DRAW)).*$

Explanation

  • ^ Assert the start of the string
  • (?! Negative lookahead that will check wat is on the right should not
    • .* Match any character 0+ times
    • (?:TRANS TO HOLD ORDER|ALTERNATE LAB DRAW) Alternation which will match either of the options and assert the end of the string
  • ) Close negative lookahead.
  • .*$ Match 0+ characters and assert the end of the string

See the Regex demo

Note: If the value can be TRANS or TRANSFER, you could use TRANS(?:FER)? with an optional part to match FER.

If the text should not be in the string, you could test this demo vb.net . If the text should not be at the end of the string, you could test this demo

If you only need 5 matches, you could match 1+ digits and a closing parenthesis \\d+\\) at the beginning:

^\\d+\\)(?!.*(?:TRANS TO HOLD ORDER|ALTERNATE LAB DRAW)).*$

Regex demo

Edit:

If you have 1 string, you could use a tempered dot approach with a positive lookahead:

\\d+\\)(?:(?!TRANS TO HOLD ORDER|ALTERNATE LAB DRAW).)*?(?=\\d+\\) |$) demo

您是对的-您需要使用负数前瞻:

^((?!TRANSFER TO HOLD ORDER|ALTERNATE LAB DRAW).)*$

Update! @The Fourth Bird has provided the correct answer. The correct answer is...

\d+\)(?:(?!TRANS TO HOLD ORDER|ALTERNATE LAB DRAW).)*?(?=\d+\) |$)

You can see it in an online regex editor here` Online Regex Editor Example

And then to complete out this post, I'm pasting the VB.NET code that goes along with it. Copy and paste the code right into Visual Studio to try it yourself.

A special thank you to @The Fourth Bird. Phenominal work, thank you!

Please rate this post up if it helps you.

Imports System
Imports System.Text.RegularExpressions

Module Program
Sub Main(args As String())

    Dim RegexStringPattern As String = "(?<=\s+\d+\)\s+).*"
    Dim StringToSearch As String = " 1) IF045196B LOREM-IPSEM,DOLOR1          G35311          12/07/2018  09/07/1985   FNL  91452SB=;*      TRANS TO HOLD ORDER
                                    2) IF045197B LOREM-IPSEM,DOLOR1          G35311          12/07/2018  09/07/1985   FNL  91377SB=;*      ALTERNATE LAB DRAW
                                    3) IF044770B LOREM-IPSEM,DOLOR1          G35311          09/26/2018  09/07/1985        3020SBX=;R      RANDOM TEXT
                                    RANDOM TEXT;*    LOREM IPSEM
                                    4) IF044445B LOREM-IPSEM,DOLOR16         G35311          07/18/2018  09/07/1985        3020SBX=;R      RANDOM TEXT
                                    RANDOM TEXT;*    LOREM IPSEM
                                    5) IF044446B LOREM-IPSEM,DOLOR17         G35311          07/18/2018  09/07/1985        10165SB=;S/R    MOVIE TITLE
                                    3020SBX=;R    RANDOM TEXT         RANDOM TEXT;*    LOREM IPSEM
                                    6) IF044447B LOREM-IPSEM,DOLOR18         G35311          07/18/2018  09/07/1985        10256SB=;S/R    MOVIE TITLE
                                    3020SBX=;R    RANDOM TEXT         RANDOM TEXT;*    LOREM IPSEM
                                    7) IF044449B LOREM-IPSEM,DOLOR19         G35311          07/18/2018  09/07/1985        10256SB=;S/R    MOVIE TITLE
                                    3020SBX=;R    RANDOM TEXT         RANDOM TEXT;*    LOREM IPSEM"

    Dim matches As MatchCollection = Regex.Matches(StringToSearch, RegexStringPattern)
    Dim listOfStrings As List(Of String) = New List(Of String)
    Dim listOfCorrectStrings As List(Of String) = New List(Of String)

    For Each match As Match In matches
        For Each capture As Capture In match.Captures
            Console.WriteLine(capture.Value)
            listOfStrings.Add(capture.Value)
        Next
    Next

    Console.ReadLine()

End Sub

End Module

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM