简体   繁体   English

逗号之间的正则表达式单词

[英]Regex words between commas

I have the following types of sentences to filter:我有以下类型的句子要过滤:

Citizens of Poland, Sweden, United States require something波兰、瑞典、美国的公民需要一些东西

Citizens of Poland require something波兰公民需要一些东西

Citizens of United States require something美国公民需要一些东西

I want to separate names of countries and later save them.我想分开国家名称,然后保存它们。 I've built the following regex mechanism for that.我为此构建了以下正则表达式机制。

String sentence;
[...]
Pattern pattern  = Pattern.compile("(?:Citizens of )? ([A-Z][a-z]+\\s*[A-Z]*[a-z]*) require");
Matcher matcher = pattern.matcher(sentence);
while (matcher.find())
        System.out.println(matcher.group(1));

It works perfect for 2 of 3 cases;它适用于 3 个案例中的 2 个;

  1. Citizens of Poland require something
  2. Citizens of United States require something

How can I build a regex pattern to get words if there's more than one?如果有多个单词,我如何构建一个正则表达式模式来获取单词?

You may try this regex in Jave with \G and a lookahead:您可以在 Jave 中使用\G和前瞻来尝试这个正则表达式:

(?:^Citizens of|(?!^)\G,) ([A-Z][a-z]+(?: [A-Z][a-z]+)*)(?=[a-zA-Z, ]*? require something$)

RegEx Demo正则表达式演示

RegEx Details:正则表达式详细信息:

  • (?: Start non-capture group (?:启动非捕获组
    • ^Citizens of Match text Citizens of at the start ^Citizens of Match 文本Citizens of at the start
    • | OR或者
    • (?,^)\G, \G asserts position at the end of the previous match or the start of the string for the first match. (?,^)\G, \G断言 position 在前一个匹配的结尾或第一个匹配的字符串的开头。 We match a comma after the previous match of a country name我们在上一个国家名称匹配之后匹配一个逗号
  • ) End non-capture group )结束非捕获组
  • Match a space匹配一个空格
  • ( Start capture group (开始捕获组
  • [AZ][az]+ Match an uppercase word [AZ][az]+匹配一个大写单词
  • (?: Start non-capture group (?:启动非捕获组
    • [AZ][az]+ Match space followed by a word whose first letter is in uppercase [AZ][az]+匹配空格后跟首字母大写的单词
  • )* End non-capture group. )*结束非捕获组。 * means match 0 or more of this group *表示匹配该组的 0 个或多个
  • ) End non-capture group )结束非捕获组
  • (?= Start lookahead condition (?=开始前瞻条件
    • [a-zA-Z, ]*? require something$ [a-zA-Z, ]*? require something$ Assert that we have 0 or more alphabets, comma or spaces before matching text require something require something [a-zA-Z, ]*? require something$在匹配文本之前断言我们有 0 个或多个字母、逗号或空格
  • ) End lookahead )结束前瞻

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM