簡體   English   中英

使用正則表達式從文本中提取未知數量的字符串

[英]Extract unknown amount of strings from a text using regular expressions

我正在使用正則表達式來從文本中提取某些信息。 例如,一個名字可以由幾個名字和一個姓氏組成(數量未知)。 以下示例提取 2 個字符串:

Name:\s+([\w-äöü]+\s[\w-äöü]+)

如何定義正則表達式以提取未知(!)數量的字符串,直到定義的下一個術語(例如“地址:”)?

Name:\s+([\wäöü-]+(?:\s+[\wäöü-]+)*?)(?=\s*Address)

證明

解釋

--------------------------------------------------------------------------------
  Name:                    'Name:'
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [\wäöü-]+                any character of: word characters (a-z,
                             A-Z, 0-9, _), 'ä', 'ö', 'ü', '-' (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the least amount
                             possible)):
--------------------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ")
                               (1 or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
      [\wäöü-]+                any character of: word characters (a-
                               z, A-Z, 0-9, _), 'ä', 'ö', 'ü', '-' (1
                               or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
    )*?                      end of grouping
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    Address                  'Address'
--------------------------------------------------------------------------------
  )                        end of look-ahead

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM