简体   繁体   English

如何使用正则表达式捕获字符串的一部分?

[英]how can I capture part of a string using regular expressions?

(in java) I want to create a function to extract parts of a string using regular expressions: (在Java中)我想创建一个使用正则表达式提取字符串部分的函数:

public HashMap<Integer,String> extract(String sentence, String expression){
} 

//I need to send a sentence like this for example: //例如,我需要发送一个这样的句子:

HashMap<Integer,String> parts =extract("hello Jhon how are you", "(hello|hi) @1 how are @2");

// the expression validates: the sentence must start with hello or hi, next a word or group of words, next the words: "how are" and next other words extra // And I want to get this: //表达式有效:句子必须以hello或hi开头,接下来是一个单词或一组单词,接下来是单词“ how are”,接下来是其他单词// //我想得到这个:

parts.get(1) --> "Jhon"
parts.get(2) --> "you"

//but this function return null if I give this: //但是如果我给出此函数,此函数将返回null:

extract("any other words","hello @1 how are @2");

I was doing it without regular expressions but the code became a little large and I'm not sure if it would be better use regular expressions to get a faster process and how could i do it with regular expressions. 我当时没有正则表达式,但是代码变大了,我不确定使用正则表达式以获得更快的处理效果是否更好,以及如何使用正则表达式来做到这一点。

Thanks for @ajb 's comment. 感谢@ajb的评论。 I've modified my question to meet Omar's requirement. 我已经修改了我的问题以满足Omar的要求。 It's more complicated than what I think, lol. 它比我想的还要复杂,大声笑。

I assume Omar wants to use regular expression he provided to capture specific word. 我认为Omar要使用他提供的正则表达式来捕获特定单词。 He uses @1, @2 ... @n to represent what he wants to capture and the integer value is also the key to retrieve the target from a map. 他使用@ 1,@ 2 ... @n表示他要捕获的内容,并且整数值也是从地图检索目标的关键。

Edit, the OP wants to put the @n into parenthese, I will preprocess the expression to replace "(" with "(?:". If this is the case, the group will still take effect but not for capture. 编辑,OP要将@n放在括号中,我将对该表达式进行预处理,以将“(”替换为“(?:”。如果是这种情况,该组仍然会生效,但不会被捕获。

import java.util.ArrayList;
import java.util.HashMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String args[]){

        Test test = new Test();
        String sentence1 = "whats the number of apple";
        String expression1 = "whats the (number of @1|@1s number)";
        HashMap<Integer, String> map1 = test.extract(sentence1, expression1);
        System.out.println(map1);
        String sentence2 = "whats the bananas number";
        HashMap<Integer, String> map2 = test.extract(sentence2, expression1);
        System.out.println(map2);
        String sentence3 = "hello Jhon how are you";
        String expression3 = "(hello|hi) @1 how are @2";
        HashMap<Integer, String> map3 = test.extract(sentence3, expression3);
        System.out.println(map3);
    }

    public HashMap<Integer,String> extract(String sentence, String expression){
        expression = expression.replaceAll("\\(", "\\(?:");
        ArrayList<Integer> keys = new ArrayList<Integer>();
        String regex4Expression = "@([\\d]*)";
        Pattern pattern4Expression = Pattern.compile(regex4Expression);
        Matcher matcher4Expression = pattern4Expression.matcher(expression);
        while(matcher4Expression.find()){
            for(int i = 1; i <= matcher4Expression.groupCount(); i++){
                if(!keys.contains(Integer.valueOf(matcher4Expression.group(i)))){
                    keys.add(Integer.valueOf(matcher4Expression.group(i)));
                }
            }
        }
        String regex = expression.replaceAll("@[\\d]*", "([\\\\w]*)");
        HashMap<Integer, String> map = new HashMap<Integer, String>();
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(sentence);

        while(matcher.find()){
            ArrayList<String> targets = new ArrayList<String>();
            for(int i = 1; i <= matcher.groupCount(); i++){
                if(matcher.group(i) != null){
                    targets.add(matcher.group(i));
                }
            }
            for(int j = 0; j < keys.size(); j++){
                map.put(j + 1, targets.get(j));
            }
        }
        return map;
    } 
}

The result is as below 结果如下

{1=apple}
{1=banana}
{1=Jhon, 2=you}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用正则表达式在Java中处理字符串的一部分 - How can I manipulate part of a string in java using regular expressions 如何在java中使用正则表达式捕获多线模式? - How can I capture a multiline pattern using a regular expressions in java? 在 Clojure 或 Java 中使用正则表达式时,如何使用命名捕获组? - When using regular expressions in Clojure or Java, how can I use named capture groups? 在Java中使用正则表达式-当两组外部括号包含嵌套的括号和字符串文字时,如何捕获它们? - Using Regular Expressions in Java - how can you capture two sets of outer parenthesis when they contain nested parenthesis and string literals? 在 Java 中使用正则表达式,如何从长度未知的字符串中捕获数字? - In Java with regular expressions, how to capture numbers from a string with unknown length? 使用正则表达式捕获不连续的文本。 我该怎么做? - Non-contiguous text Capture with Regular Expressions. How can I do it? 如何使用正则表达式递归匹配模式? - How can I recursively match a pattern using Regular Expressions? 如何使用正则表达式替换字符串的一部分 - how to replace parts of string using regular expressions 如何使用Java正则表达式拆分此字符串 - How to split this string using Java Regular Expressions 如何在 android 中使用正则表达式替换字符串中的某些部分,除了某些单词 - How can i replace some part in string except some word using regular expression in android
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM