简体   繁体   中英

Regex for New Line Can't Figure out

I am trying to get anything that follows "per annum., and precedes "All taxes," but can't figure out the regex for this.

I tried a couple of regex(es), but didn't work for some reason. Can anyone help? Tried to use regexpal and python, but both didn't work.

> r'per annum\\.(.+)\nAll taxes are assessed'
> 
> r'per annum\\.\n(.+)\nAll taxes are assessed'

> r'per annum(.+)nAll taxes are assessed'

interest charges at 8.0 % per annum.

MCMAHON, DENISE M
%RDM PROPERTIES
PO BOX 653
GOFFSTOWN NH 03045
MCMAHON, RAYMOND J
All taxes are assessed as of April 1st of each year.  Unless 
directed otherwise, tax bills are mailed to the last known 
address of the first owner l
per annum.\n([\S\s]*)All taxes

Could work for you. [\\S\\s] matches any character, including newline.

This is a Python solution:

import re
text = 'your text here'
match = re.search(r'\bper annum\.\s*(.+?)\nAll taxes are assessed', text, re.S)
if match:
  print(match.group(1))

See online regex demo

The (.+?) captures any text between per annum. and a newline followed with All taxes are assessed . Note the dot after annum is escaped as it is a special regex char. . matches line endings thanks to re.S flag .

Also, re.search finds the first regex match, match.group(1) gets the capture in Group 1.

You are confused about raw strings. In a raw Python string, a backslash simply represents a backslash. But then the regex engine interprets those.

r'\\\\' as a regex matches a literal backslash.

r'\\n' as a regex matches a newline.

r'\\.' (or r'[.]' or '\\\\.' without the r prefix) matches a literal dot.

So you bug is the regex for matching a dot, not the one for matching a newline.

In addition, of course, if you want to match multiple lines, say so;

r'per annum\.(\n.+)+?All taxes are assessed'

The nongreedy +? says to match as few repetitions as possible, instead of as many as possible.

There are already other answers which will work, but this one answers the question 'Regex for new line' more precicely. In regex, the dot matches any character except line terminators. So you want to match and capture any character or a newline. I put this part in a non-capturing group, but that's not strictly necessary. You could instead ignore all the matches made by the inner group.

I assume that you don't want to capture the empty line, so I put another newline in front of the capture group.

r'per annum\.\n\n((?:.|\n)+)\nAll taxes'

The [\\s\\S] approach, as already mentioned also works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM