简体   繁体   English

使用正则表达式从文本中提取未知数量的字符串

[英]Extract unknown amount of strings from a text using regular expressions

I´m using regular expressions in order to extract certain information from a text.我正在使用正则表达式来从文本中提取某些信息。 A name, for example, can consist of several first names and a last name (quantity not known).例如,一个名字可以由几个名字和一个姓氏组成(数量未知)。 The following example extracts 2 strings:以下示例提取 2 个字符串:

Name:\s+([\w-äöü]+\s[\w-äöü]+)

How can define regular expressions in order to extract an unknown (!) amount of string, up to a defined next term (eg "Address:")?如何定义正则表达式以提取未知(!)数量的字符串,直到定义的下一个术语(例如“地址:”)?

Use

Name:\s+([\wäöü-]+(?:\s+[\wäöü-]+)*?)(?=\s*Address)

See proof .证明

Explanation解释

--------------------------------------------------------------------------------
  Name:                    'Name:'
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [\wäöü-]+                any character of: word characters (a-z,
                             A-Z, 0-9, _), 'ä', 'ö', 'ü', '-' (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the least amount
                             possible)):
--------------------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ")
                               (1 or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
      [\wäöü-]+                any character of: word characters (a-
                               z, A-Z, 0-9, _), 'ä', 'ö', 'ü', '-' (1
                               or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
    )*?                      end of grouping
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    Address                  'Address'
--------------------------------------------------------------------------------
  )                        end of look-ahead

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 编写函数以使用正则表达式从字符串中提取整数 - Writing a function to extract integers from strings using regular expressions 如何使用正则表达式仅从以下字符串中提取URL? - How to extract only the URL from the following strings using regular expressions? 如何使用 Python 和正则表达式从文件中提取文本部分 - How to extract text part from file using Python & Regular Expressions 如何使用正则表达式从html标记之间提取文本? - How to extract text from between html tag using Regular Expressions? 使用正则表达式从文本文件中提取字符串 - Using regular expressions to extract string from text file 使用正则表达式从文本中提取键和值 - Extract Keys and Values from text using regular expressions 在Python中使用正则表达式查找文本中的字符串 - Finding a strings in a text using regular expressions with Python 如何使用正则表达式从乱码的文本中提取一些子文本模式? - How can I extract some patterns of sub text from a gibberish looking text using regular expressions? 如何使用正则表达式仅提取输入文本的某些部分? - How to extract only certain sections of an input text using regular expressions? 美丽的汤/正则表达式:从NavigableString中提取一部分文本 - Beautiful Soup / Regular Expressions: Extract a portion of text from NavigableString
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM