I´m using regular expressions in order to extract certain information from a text. A name, for example, can consist of several first names and a last name (quantity not known). The following example extracts 2 strings:
Name:\s+([\w-äöü]+\s[\w-äöü]+)
How can define regular expressions in order to extract an unknown (!) amount of string, up to a defined next term (eg "Address:")?
Use
Name:\s+([\wäöü-]+(?:\s+[\wäöü-]+)*?)(?=\s*Address)
See proof .
Explanation
--------------------------------------------------------------------------------
Name: 'Name:'
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[\wäöü-]+ any character of: word characters (a-z,
A-Z, 0-9, _), 'ä', 'ö', 'ü', '-' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ")
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
[\wäöü-]+ any character of: word characters (a-
z, A-Z, 0-9, _), 'ä', 'ö', 'ü', '-' (1
or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
Address 'Address'
--------------------------------------------------------------------------------
) end of look-ahead
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.