[英]Writing a function to extract integers from strings using regular expressions
[英]Extract unknown amount of strings from a text using regular expressions
我正在使用正则表达式来从文本中提取某些信息。 例如,一个名字可以由几个名字和一个姓氏组成(数量未知)。 以下示例提取 2 个字符串:
Name:\s+([\w-äöü]+\s[\w-äöü]+)
如何定义正则表达式以提取未知(!)数量的字符串,直到定义的下一个术语(例如“地址:”)?
用
Name:\s+([\wäöü-]+(?:\s+[\wäöü-]+)*?)(?=\s*Address)
见证明。
解释
--------------------------------------------------------------------------------
Name: 'Name:'
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[\wäöü-]+ any character of: word characters (a-z,
A-Z, 0-9, _), 'ä', 'ö', 'ü', '-' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ")
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
[\wäöü-]+ any character of: word characters (a-
z, A-Z, 0-9, _), 'ä', 'ö', 'ü', '-' (1
or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
Address 'Address'
--------------------------------------------------------------------------------
) end of look-ahead
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.