简体   繁体   English

新行的正则表达式无法弄清楚

[英]Regex for New Line Can't Figure out

I am trying to get anything that follows "per annum., and precedes "All taxes," but can't figure out the regex for this.我试图获得“每年”之后的任何内容,并在“所有税款”之前,但无法弄清楚此正则表达式。

I tried a couple of regex(es), but didn't work for some reason.我尝试了几个正则表达式,但由于某种原因没有奏效。 Can anyone help?任何人都可以帮忙吗? Tried to use regexpal and python, but both didn't work.尝试使用 regexpal 和 python,但都不起作用。

> r'per annum\\.(.+)\nAll taxes are assessed'
> 
> r'per annum\\.\n(.+)\nAll taxes are assessed'

> r'per annum(.+)nAll taxes are assessed'

interest charges at 8.0 % per annum.

MCMAHON, DENISE M
%RDM PROPERTIES
PO BOX 653
GOFFSTOWN NH 03045
MCMAHON, RAYMOND J
All taxes are assessed as of April 1st of each year.  Unless 
directed otherwise, tax bills are mailed to the last known 
address of the first owner l
per annum.\n([\S\s]*)All taxes

Could work for you.可以为你工作。 [\\S\\s] matches any character, including newline. [\\S\\s] 匹配任何字符,包括换行符。

This is a Python solution:这是一个 Python 解决方案:

import re
text = 'your text here'
match = re.search(r'\bper annum\.\s*(.+?)\nAll taxes are assessed', text, re.S)
if match:
  print(match.group(1))

See online regex demo查看在线正则表达式演示

The (.+?) captures any text between per annum. (.+?)捕获per annum.之间的任何文本per annum. and a newline followed with All taxes are assessed .和一个换行符后跟All taxes are assessed Note the dot after annum is escaped as it is a special regex char.注意annum后面的点被转义了,因为它是一个特殊的正则表达式字符。 . matches line endings thanks to re.S flag .由于re.S标志匹配行尾

Also, re.search finds the first regex match, match.group(1) gets the capture in Group 1.此外, re.search找到第一个正则表达式匹配, match.group(1)获取第 1 组中的捕获。

You are confused about raw strings.您对原始字符串感到困惑。 In a raw Python string, a backslash simply represents a backslash.在原始 Python 字符串中,反斜杠仅表示反斜杠。 But then the regex engine interprets those.但是正则表达式引擎会解释这些。

r'\\\\' as a regex matches a literal backslash. r'\\\\'作为正则表达式匹配文字反斜杠。

r'\\n' as a regex matches a newline. r'\\n'作为正则表达式匹配换行符。

r'\\.' (or r'[.]' or '\\\\.' without the r prefix) matches a literal dot. (或r'[.]''\\\\.'不带r前缀)匹配文字点。

So you bug is the regex for matching a dot, not the one for matching a newline.所以你的错误是匹配点的正则表达式,而不是匹配换行符的正则表达式。

In addition, of course, if you want to match multiple lines, say so;另外,当然,如果你想匹配多行,就这么说;

r'per annum\.(\n.+)+?All taxes are assessed'

The nongreedy +?非贪婪+? says to match as few repetitions as possible, instead of as many as possible.说匹配尽可能少的重复,而不是尽可能多。

There are already other answers which will work, but this one answers the question 'Regex for new line' more precicely.已经有其他答案可以使用,但是这个答案更准确地回答了“新行的正则表达式”问题。 In regex, the dot matches any character except line terminators.在正则表达式中,点匹配除行终止符之外的任何字符。 So you want to match and capture any character or a newline.所以你想匹配和捕获任何字符或换行符。 I put this part in a non-capturing group, but that's not strictly necessary.我把这部分放在一个非捕获组中,但这并不是绝对必要的。 You could instead ignore all the matches made by the inner group.您可以改为忽略内部组所做的所有匹配。

I assume that you don't want to capture the empty line, so I put another newline in front of the capture group.我假设您不想捕获空行,因此我在捕获组前面放置了另一个换行符。

r'per annum\.\n\n((?:.|\n)+)\nAll taxes'

The [\\s\\S] approach, as already mentioned also works.前面提到的 [\\s\\S] 方法也有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM