简体   繁体   English

在Java中使用正则表达式进行匹配?

[英]Matching using regular expressions in Java?

I wish to find whole words in a text string. 我希望在文本字符串中找到整个单词。 The words in the string are separated by spaces and new lines, so I used these two characters to find the beginning and ending of each word. 字符串中的单词由空格和换行符分隔,因此我使用了这两个字符来查找每个单词的开头和结尾。 When the pattern is "\\s" or "\\n", the program correctly finds the indices, and it does not when matching both characters. 当模式为“ \\ s”或“ \\ n”时,程序会正确找到索引,而匹配两个字符时则不会。 How can I fix this program? 我该如何修复该程序?

import java.util.*;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class convertText{

    public static String findText(String text){

        String r = text.trim();

        // System.out.println(r);

        Pattern pattern = Pattern.compile("\\s+ | \\n");

        Matcher matcher = pattern.matcher(text);

    while (matcher.find()) {
        // System.out.println(matcher.start());
        System.out.println(text.substring(matcher.start()+1));
    }

        return text;
    }

    public static void main(String[] args) {
        // String test = " hi \n ok this. "; 
        String test = " hi ok this. "; 
        // System.out.println(test.substring(7));
        // System.out.println(test);
        findText(test);
    }


}

You can use [^\\\\s]+ to search for any character that isn't a newline or a space (aka words) and print the groups: 您可以使用[^\\\\s]+搜索不是换行符或空格的任何字符(又称单词)并打印组:

Pattern pattern = Pattern.compile("[^\\s]+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
    System.out.println(matcher.group());
}

[^\\\\s]+ can be broken down into: [^\\\\s]+可细分为:

  • \\\\s matches any whitespace character, this includes regular spaces as well as newlines (so we don't need to specify \\\\n separately) \\\\s匹配任何空格字符,包括常规空格和换行符(因此我们无需单独指定\\\\n
  • [ and ] which define a character set . []定义字符集 this will match any character inside the brackets 这将匹配括号内的任何字符
  • ^ means "not", as the first character inside the character set inverts the match and only matches characters not in the set (anything but spaces and newlines in this case). ^表示“不是”,因为字符集中的第一个字符会反转匹配项,并且仅匹配不在集合中的字符(在这种情况下,除了空格和换行符之外的任何字符)。
  • + matches one or more of the previous token, in this case the previous token is the character expression matching non whitespace characters +匹配一个或多个先前的标记,在这种情况下,先前的标记是与非空格字符匹配的字符表达式

You can do it using java 8 Stream API as follow 您可以按照以下方式使用Java 8 Stream API进行操作

String test = " hi ok this. ";
Pattern.compile("\\W+").splitAsStream(test.trim())
            .forEach(System.out::println);

Output: 输出:

hi
ok
this

If you want to match all words in a text string you can use: 如果要匹配文本字符串中的所有单词,可以使用:

(?i)[az]+ java escaped: "(?i)[az]+" (?i)[az]+ java逃脱了: "(?i)[az]+"

(?i) ... Turn on case insensitive match. (?i) ...打开不区分大小写的匹配。
[az]+ ... Match any letter from az as many times as possible. [az]+ ...尽可能匹配来自z的任何字母。

or you could use: 或者您可以使用:

\\w+ ... Matches ASCII letter , digit and underscore . \\w+ ...匹配ASCII letterdigitunderscore As many times as possible. 尽可能多的次数。


    try {
        String subjectString = " hi ok this. ";
        Pattern regex = Pattern.compile("(?i)[a-z]+", Pattern.MULTILINE);
        Matcher regexMatcher = regex.matcher(subjectString);
        while (regexMatcher.find()) {
            String word = regexMatcher.group();
            int start_pos = regexMatcher.start();
            int end_pos = regexMatcher.end();
            JOptionPane.showMessageDialog(null, ""+word+ " found from pos: "+start_pos+" to "+end_pos);
        }
    } catch (PatternSyntaxException ex) {

    }

\\s doesn't match a single space (only). \\ s不匹配单个空格(仅)。 It matches ASCII space , tab , line feed , carriage return , vertical tab and form feed . 它匹配ASCII spacetabline feedcarriage returnvertical tabform feed So you would only need \\s+ to match all kinds of white space character. 因此,您只需要\\ s +即可匹配各种空白字符。

只需用空格字符集分割字符串:

String[] words = yourString.split("\\s+");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM