简体   繁体   English

根据正则表达式捕获所有名称

[英]Capturing all names in line with regular expression

I'm trying to capture all words starting with upper case going one after another and preceding the Inc word. 我正在尝试捕获所有以大写字母开头的单词,然后是Inc单词。 For example, to capture Test Alphabet from the whole row Parent company Test Alphabet Inc. announced . 例如,为了捕获整行中的Test AlphabetParent company Test Alphabet Inc. announced I made a regular expression pattern: 我做了一个正则表达式模式:

([A-Z]{1}[a-z]+)+

which takes all words starting with upper case. 其中所有单词都以大写字母开头。 But it grabs Parent , which is not needed. 但是它会抓住Parent ,这是不需要的。 When I try to limit the condition in this way: 当我尝试以这种方式限制条件时:

([A-Z]{1}[a-z]+)+ (?=(Inc))

it takes only Alphabet and doesn't grab Test word which is needed. 它只需要输入Alphabet而不会获取所需的Test单词。 Please help me understand how to grab all words starting with upper case following one another and preceding Inc word? 请帮助我了解如何抓取所有以大写字母开头的单词,以及紧跟其后的Inc单词吗? Thanks in advance! 提前致谢!

You can use this lookahead regex to match: 您可以使用此先行正则表达式进行匹配:

[A-Z][a-zA-Z]*(?=\s*(?:[A-Z][a-zA-Z]*\s+)*Inc\.)

RegEx Demo 正则演示

  • [AZ][a-zA-Z]* matches a word that starts with uppercase letter [AZ][a-zA-Z]*与以大写字母开头的单词匹配
  • Lookahead expression inside (?=...) ensures that we have 0 or more uppercase words followed by Inc. ahead of current word. (?=...)内部的前瞻表达式可确保我们在当前单词之前有0个或多个大写单词,后跟Inc. .。

Try 尝试

((?:[A-Z]\w*\s*)*\s?)(?=\sInc)

It capture the company name as one group. 它捕获公司名称为一组。 It takes one shortcut using \\w as allowed characters in the name. 它使用一个\\w作为名称中允许的字符的快捷方式。 This means names can be a mixture of upper and lower case letters, as well as _ . 这意味着名称可以是大小写字母以及_ If this is unwanted behavior, change the \\w to [az] for lower case letters only, or [A-Za-z] for mixed lower and upper case. 如果这是不必要的行为,则仅将\\w更改为[az]以表示小写字母,将[az] \\w [A-Za-z]更改为大写和小写字母。

See it here at regex101. 在regex101上查看。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM