简体   繁体   English

正则表达式 - 在两个指定单词之间查找特定长度的单词

[英]Regex - Find words of certain length between two specified words

Good day,再会,

I have recently started working with regex (in java) and have stumbled onto a problem I require some assistance/guidance.我最近开始使用正则表达式(在 Java 中)并偶然发现了一个问题,我需要一些帮助/指导。

I am looking to find words of a certain length (in this case 4 characters long or more) between two words jack and james.我希望在 jack 和 james 两个单词之间找到特定长度的单词(在本例中为 4 个字符或更多)。

The following is the text I am using to test my regex against.以下是我用来测试我的正则表达式的文本。

james was playing with jack yesterday (line 1)
jack was playing with james yersterday (line 2)
jack and james are best friends (line 3)
james will be helping jack with his homework (line 4)
yesterday, james come over jack's house (line 5)

What I hope to achieve is the following我希望实现的是以下

playing with(line 1)
playing with(line 2)
no matches(line 3)
will helping(line 4)
come over(line 5)

I have come up with the following我想出了以下内容

(?<=james)(.*)(?=jack)|(?<=jack)(.*)(?=james)

But this particular regex, returns all characters between the two words.但是这个特定的正则表达式返回两个单词之间的所有字符。 I also tried the following unsuccessfully (as well as many others before frustration started taking over).我也尝试了以下失败(以及在沮丧开始接管之前的许多其他尝试)。 Also, I omitted另外,我省略了

(?<=james)(\\b\w{4,}\\b)(?=jack)|(?<=jack)(\\b\w{4,}\\b)(?=james)

Any guidance would be greatly appreciated.任何指导将不胜感激。

Sincerely真挚地

This seems to work as required.这似乎按要求工作。

  • (?<=) positive look behind for the two names (?<=)正面看后面的两个名字
  • (?=) positve look ahead for the two names. (?=)积极向前看这两个名字。
  • \\\\w{4,} a word of more than three characters \\\\w{4,}一个超过三个字符的单词
  • .* used to gobble up the chars between the two zero width assertions. .*用于吞噬两个零宽度断言之间的字符。
String[] lines =  {"james was playing with jack yesterday (line 1)",
    "jack was playing with james yersterday (line 2)",
    "jack and james are best friends (line 3)",
    "james will be helping jack with his homework (line 4)",
    "yesterday, james come over jack's house (line 5)"};

Pattern p = Pattern.compile("(?<=(?:jack|james).*)(\\w{4,})(?=.*(?:jack|james))");

for (String line : lines) {
      Matcher m = p.matcher(line);
      // a flag for printing a new line.     
      boolean flag = false;
      while(m.find()) {
          flag = true;
          System.out.print(m.group(1) + " " );
      }
      if (flag) {
          System.out.println();
      }
}

Prints印刷

playing with 
playing with 
will helping 
come over 

Use

(?:\G(?<!^)|(jack|james))(?:\W+\w{1,3})*\W+(\w{4,})(?=(?:(?!\1).)*?(?!\1)(jack|james))

See proof .证明 You need the values that are held inside Group 2.您需要组 2 中保存的值。

Explanation解释

                         EXPLANATION
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    \G                       where the last m//g left off
--------------------------------------------------------------------------------
    (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
      ^                        the beginning of the string
--------------------------------------------------------------------------------
    )                        end of look-behind
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      jack                     'jack'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      james                    'james'
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \W+                      non-word characters (all but a-z, A-Z, 0-
                             9, _) (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    \w{1,3}                  word characters (a-z, A-Z, 0-9, _)
                             (between 1 and 3 times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  \W+                      non-word characters (all but a-z, A-Z, 0-
                           9, _) (1 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    \w{4,}                   word characters (a-z, A-Z, 0-9, _) (at
                             least 4 times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the least amount
                             possible)):
--------------------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
        \1                       what was matched by capture \1
--------------------------------------------------------------------------------
      )                        end of look-ahead
--------------------------------------------------------------------------------
      .                        any character except \n
--------------------------------------------------------------------------------
    )*?                      end of grouping
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      \1                       what was matched by capture \1
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    (                        group and capture to \3:
--------------------------------------------------------------------------------
      jack                     'jack'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      james                    'james'
--------------------------------------------------------------------------------
    )                        end of \3
--------------------------------------------------------------------------------
  )                        end of look-ahead

Java : 爪哇

import java.util.*;
import java.util.regex.*;
import java.lang.*;
import java.io.*;

class Ideone
{
    public static void main (String[] args) throws java.lang.Exception
    {
        String regex = "(?:\\G(?<!^)|(jack|james))(?:\\W+\\w{1,3})*\\W+(\\w{4,})(?=(?:(?!\\1).)*?(?!\\1)(jack|james))";
        String string = "james was playing with jack yesterday";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(string);
        List<String> results = new ArrayList<>();
        while (matcher.find()) {
            results.add(matcher.group(2));
        }
        System.out.println(String.join(" ", results));
    }
}

Result: playing with结果: playing with

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM