简体   繁体   中英

get all the text between two newline characters(\n) of a raw_text using python regex

So I have several examples of raw text in which I have to extract the characters after 'Terms'. The common pattern I see is after the word 'Terms' there is a '\n' and also at the end '\n' I want to extract all the characters(words, numbers, symbols) present between these to \n but after keyword 'Terms'.

Some examples of text are given below:

1) \nTERMS \nDirect deposit; Routing #256078514, acct. #160935\n\n'
2) \nTerms\nDue on receipt\nDue Date\n1/31/2021
3) \nTERMS: \nNET 30 DAYS\n

The code I have written is given below:

def get_term_regex(s):
    raw_text = s
    term_regex1 = r'(TERMS\s*\\n(.*?)\\n)'

    try:
        if ('TERMS' or 'Terms') in raw_text:
            
            pattern1 = re.search(term_regex1,raw_text)
            #print(pattern1)
            return pattern1
    except:
        pass

But I am not getting any output, as there is no match.

The expected output is:

1) Direct deposit; Routing #256078514, acct. #160935
2) Due on receipt
3) NET 30 DAYS

Any help would be really appreciated.

Try the following:

import re

text = '''1) \nTERMS \nDirect deposit; Routing #256078514, acct. #160935\n\n'
2) \nTerms\nDue on receipt\nDue Date\n1/31/2021
3) \nTERMS: \nNET 30 DAYS\n''' # \n are real new lines

for m in re.finditer(r'(TERMS|Terms)\W*\n(.*?)\n', text):
    print(m.group(2))
  1. Note that your regex could not deal with the third 'line' because there is a colon : after TERMS . So I replaced \s with \W .

  2. ('TERMS' or 'Terms') in raw_text might not be what you want. It does not raise a syntax error, but it is just the same as 'TERMS' in raw_text ; when python evaluates the parenthesis part, both 'TERMS' and 'Terms' are all truthy , and therefore python just takes the last truthy value, ie, 'Terms' . The result is, TERMS cannot be picked up by that part!

    So you might instead want someting like ('TERMS' in raw_text) or ('Terms' in raw_text) , although it is quite verbose.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM