简体   繁体   English

将句子分隔为特定单词

[英]Grouping sentences separated by specific word

I'm trying to group 2 sub-sentences of whatever reasonable length separated by a specific word (in the example "AND"), where the second can be optional. 我正在尝试将合理长度的2个子句子分组,用一个特定的单词(在示例“ AND”中)分隔,其中第二个可以是可选的。 Some example: 一些例子:

CASE1: 情况1:

foo sentence A AND foo sentence B

shall give 应该给

"foo sentence A" --> matching group 1

"AND" --> matching  group 2 (optionally)

"foo sentence B" --> matching  group 3

CASE2: 案例2:

foo sentence A

shall give 应该给

"foo sentence A" --> matching  group 1
"" --> matching  group 2 (optionally)
"" --> matching  group 3

I tried the following regex 我尝试了以下正则表达式

(.*) (AND (.*))?$

and it works but only if, in CASE2, i put an empty space at the final position of the string, otherwise the pattern doesn't match. 并且它有效,但前提是在CASE2中,我在字符串的最后位置放置了一个空格,否则该模式不匹配。 If I include the space before "AND" inside round brackets group, in the case 1 the matcher includes the whole string in the first group. 如果在圆括号组中的“ AND”之前包含空格,则在情况1中,匹配器将整个字符串都包含在第一组中。 I wondered aroung lookahead and lookbehind assertions, but not sure they can help me. 我想知道断言是否前后,但不确定它们是否可以帮助我。 Any suggestion? 有什么建议吗? Thanks 谢谢

How about just using 只是使用怎么样

String split[] = sentence.split("AND");

That will split the sentence up by your word and give you a list of subparts. 这将按您的单词拆分句子,并为您提供子部分列表。

Description 描述

This regex will return the requested string parts into the requested groups. 此正则表达式会将请求的字符串部分返回到请求的组中。 The and is optional, if it's not found in the string then the entire string is placed into group 1. All the \\s*? and是可选的,如果在字符串中找不到,则将整个字符串放入组1。所有\\s*? forces the captured groups to have their white space trimmed automatically. 强制捕获的组自动修剪其空白区域。

^\\s*?\\b(.*?)\\b\\s*?(?:\\b(and)\\b\\s*?\\b(.*?)\\b\\s*?)?$

在此处输入图片说明

Groups 团体

0 gets the entire matching string 0获取整个匹配字符串

  1. gets the string before the seperating word and , if no and then the entire string appears here 在分隔的单词and之前获取字符串,如果没有and则整个字符串出现在此处
  2. gets the separating word, in this case it's and 得到分离的话,在这种情况下,它and
  3. gets the second part of the string 获取字符串的第二部分

Java Code Example: Java代码示例:

Case 1 情况1

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "foo sentence A AND foo sentence B";
  Pattern re = Pattern.compile("^\\s*?\\b(.*?)\\b\\s*?(?:\\b(and)\\b\\s*?\\b(.*?)\\b\\s*?)?$",Pattern.CASE_INSENSITIVE);
  Matcher m = re.matcher(sourcestring);
    if(m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + groupIdx + "] = " + m.group(groupIdx));
      }
    }
  }
}

$matches Array:
(
    [0] => foo sentence A AND foo sentence B
    [1] => foo sentence A
    [2] => AND
    [3] =>  foo sentence B
)

Case 2, using the same regex 情况2,使用相同的正则表达式

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "foo sentence A";
  Pattern re = Pattern.compile("^\\s*?\\b(.*?)\\b\\s*?(?:\\b(and)\\b\\s*?\\b(.*?)\\b\\s*?)?$",Pattern.CASE_INSENSITIVE);
  Matcher m = re.matcher(sourcestring);
    if(m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + groupIdx + "] = " + m.group(groupIdx));
      }
    }
  }
}

$matches Array:
(
    [0] => foo sentence A
    [1] => foo sentence A
)

I'd use this regex: 我会用这个正则表达式:

^(.*?)(?: (AND) (.*))?$

explanation: 说明:

The regular expression:

(?-imsx:^(.*?)(?: (AND) (.*))?$)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
                             ' '
----------------------------------------------------------------------
    (                        group and capture to \2:
----------------------------------------------------------------------
      AND                      'AND'
----------------------------------------------------------------------
    )                        end of \2
----------------------------------------------------------------------
                             ' '
----------------------------------------------------------------------
    (                        group and capture to \3:
----------------------------------------------------------------------
      .*                       any character except \n (0 or more
                               times (matching the most amount
                               possible))
----------------------------------------------------------------------
    )                        end of \3
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

Change your regex to make the space after he first sentence optional: 更改您的正则表达式以在他的第一句话之后添加空格:

(.*\\S) ?(AND (.*))?$

Or you could use split() to consume the AND and any surrounding spaces: 或者,您可以使用split()消耗AND和任何周围的空格:

String sentences = sentence.spli("\\s*AND\\s*");

your case 2 is a little strange... 你的情况2有点奇怪...

but I would do 但我会做

String[] parts = sentence.split("(?<=AND)|(?=AND)"));

you check the parts.length . 您检查parts.length if length==1, then it is case2. 如果length == 1,则为case2。 you just have the sentence in array, you could add empty string as your "group2/3" 您只需将句子放在数组中,就可以将空字符串添加为“ group2 / 3”

if in case1 you have directly parts : 如果在case1中,您直接parts

[foo sentence A , AND,  foo sentence B]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM