简体   繁体   English

在 Java 中使用正则表达式提取值

[英]Using Regular Expressions to Extract a Value in Java

I have several strings in the rough form:我有几个粗略的字符串:

[some text] [some number] [some more text]

I want to extract the text in [some number] using the Java Regex classes.我想使用 Java Regex 类提取 [some number] 中的文本。

I know roughly what regular expression I want to use (though all suggestions are welcome).我大致知道我想使用什么正则表达式(尽管欢迎所有建议)。 What I'm really interested in are the Java calls to take the regex string and use it on the source data to produce the value of [some number].我真正感兴趣的是 Java 调用以获取正则表达式字符串并在源数据上使用它来生成 [某个数字] 的值。

EDIT: I should add that I'm only interested in a single [some number] (basically, the first instance).编辑:我应该补充一点,我只对一个[某个数字](基本上是第一个实例)感兴趣。 The source strings are short and I'm not going to be looking for multiple occurrences of [some number].源字符串很短,我不会寻找[某个数字] 的多次出现。

Full example:完整示例:

private static final Pattern p = Pattern.compile("^([a-zA-Z]+)([0-9]+)(.*)");
public static void main(String[] args) {
    // create matcher for pattern p and given string
    Matcher m = p.matcher("Testing123Testing");

    // if an occurrence if a pattern was found in a given string...
    if (m.find()) {
        // ...then you can use group() methods.
        System.out.println(m.group(0)); // whole matched expression
        System.out.println(m.group(1)); // first expression from round brackets (Testing)
        System.out.println(m.group(2)); // second one (123)
        System.out.println(m.group(3)); // third one (Testing)
    }
}

Since you're looking for the first number, you can use such regexp:由于您正在寻找第一个数字,您可以使用这样的正则表达式:

^\D+(\d+).*

and m.group(1) will return you the first number.m.group(1)将返回您的第一个数字。 Note that signed numbers can contain a minus sign:请注意,有符号数可以包含减号:

^\D+(-?\d+).*
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Regex1 {
    public static void main(String[]args) {
        Pattern p = Pattern.compile("\\d+");
        Matcher m = p.matcher("hello1234goodboy789very2345");
        while(m.find()) {
            System.out.println(m.group());
        }
    }
}

Output:输出:

1234
789
2345

Allain basically has the java code, so you can use that. Allin 基本上有 java 代码,所以你可以使用它。 However, his expression only matches if your numbers are only preceded by a stream of word characters.但是,他的表达式仅您的数字前面仅带有单词字符流时才匹配。

"(\\d+)"

should be able to find the first string of digits.应该能够找到第一串数字。 You don't need to specify what's before it, if you're sure that it's going to be the first string of digits.如果您确定它将是第一个数字字符串,则无需指定它之前的内容。 Likewise, there is no use to specify what's after it, unless you want that.同样,除非您想要,否则指定其后的内容也没有用。 If you just want the number, and are sure that it will be the first string of one or more digits then that's all you need.如果您只想要数字,并且确定它将是一个或多个数字的第一个字符串,那么这就是您所需要的。

If you expect it to be offset by spaces, it will make it even more distinct to specify如果您希望它被空格抵消,那么指定它会更加明显

"\\s+(\\d+)\\s+"

might be better.可能会更好。

If you need all three parts, this will do:如果您需要所有三个部分,这将执行以下操作:

"(\\D+)(\\d+)(.*)"

EDIT The Expressions given by Allain and Jack suggest that you need to specify some subset of non-digits in order to capture digits .编辑Alllain 和 Jack 给出的表达式表明您需要指定一些非数字子集以捕获数字 If you tell the regex engine you're looking for \\d then it's going to ignore everything before the digits.如果您告诉正则表达式引擎您正在寻找\\d那么它将忽略数字之前的所有内容。 If J or A's expression fits your pattern, then the whole match equals the input string .如果 J 或 A 的表达式符合您的模式,则整个匹配项等于输入字符串 And there's no reason to specify it.而且没有理由指定它。 It probably slows a clean match down, if it isn't totally ignored.如果它没有被完全忽略,它可能会减慢一场干净的比赛。

In addition to Pattern , the Java String class also has several methods that can work with regular expressions, in your case the code will be:除了Pattern ,Java String类还有几个可以使用正则表达式的方法,在您的情况下,代码将是:

"ab123abc".replaceFirst("\\D*(\\d*).*", "$1")

where \\\\D is a non-digit character.其中\\\\D是非数字字符。

In Java 1.4 and up:在 Java 1.4 及更高版本中:

String input = "...";
Matcher matcher = Pattern.compile("[^0-9]+([0-9]+)[^0-9]+").matcher(input);
if (matcher.find()) {
    String someNumberStr = matcher.group(1);
    // if you need this to be an int:
    int someNumberInt = Integer.parseInt(someNumberStr);
}

This function collect all matching sequences from string.此函数从字符串中收集所有匹配的序列。 In this example it takes all email addresses from string.在这个例子中,它从字符串中获取所有电子邮件地址。

static final String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*@"
        + "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";

public List<String> getAllEmails(String message) {      
    List<String> result = null;
    Matcher matcher = Pattern.compile(EMAIL_PATTERN).matcher(message);

    if (matcher.find()) {
        result = new ArrayList<String>();
        result.add(matcher.group());

        while (matcher.find()) {
            result.add(matcher.group());
        }
    }

    return result;
}

For message = "adf@gmail.com, <another@osiem.osiem>>>> lalala@aaa.pl" it will create List of 3 elements.对于message = "adf@gmail.com, <another@osiem.osiem>>>> lalala@aaa.pl"它将创建 3 个元素的列表。

Simple Solution简单的解决方案

// Regexplanation:
// ^       beginning of line
// \\D+    1+ non-digit characters
// (\\d+)  1+ digit characters in a capture group
// .*      0+ any character
String regexStr = "^\\D+(\\d+).*";

// Compile the regex String into a Pattern
Pattern p = Pattern.compile(regexStr);

// Create a matcher with the input String
Matcher m = p.matcher(inputStr);

// If we find a match
if (m.find()) {
    // Get the String from the first capture group
    String someDigits = m.group(1);
    // ...do something with someDigits
}

Solution in a Util Class Util 类中的解决方案

public class MyUtil {
    private static Pattern pattern = Pattern.compile("^\\D+(\\d+).*");
    private static Matcher matcher = pattern.matcher("");

    // Assumptions: inputStr is a non-null String
    public static String extractFirstNumber(String inputStr){
        // Reset the matcher with a new input String
        matcher.reset(inputStr);

        // Check if there's a match
        if(matcher.find()){
            // Return the number (in the first capture group)
            return matcher.group(1);
        }else{
            // Return some default value, if there is no match
            return null;
        }
    }
}

...

// Use the util function and print out the result
String firstNum = MyUtil.extractFirstNumber("Testing4234Things");
System.out.println(firstNum);

Try doing something like this:尝试做这样的事情:

Pattern p = Pattern.compile("^.+(\\d+).+");
Matcher m = p.matcher("Testing123Testing");

if (m.find()) {
    System.out.println(m.group(1));
}

Look you can do it using StringTokenizer看你可以使用 StringTokenizer

String str = "as:"+123+"as:"+234+"as:"+345;
StringTokenizer st = new StringTokenizer(str,"as:");

while(st.hasMoreTokens())
{
  String k = st.nextToken();    // you will get first numeric data i.e 123
  int kk = Integer.parseInt(k);
  System.out.println("k string token in integer        " + kk);

  String k1 = st.nextToken();   //  you will get second numeric data i.e 234
  int kk1 = Integer.parseInt(k1);
  System.out.println("new string k1 token in integer   :" + kk1);

  String k2 = st.nextToken();   //  you will get third numeric data i.e 345
  int kk2 = Integer.parseInt(k2);
  System.out.println("k2 string token is in integer   : " + kk2);
}

Since we are taking these numeric data into three different variables we can use this data anywhere in the code (for further use)由于我们将这些数字数据放入三个不同的变量中,因此我们可以在代码中的任何位置使用这些数据(以供进一步使用)

How about [^\\\\d]*([0-9]+[\\\\s]*[.,]{0,1}[\\\\s]*[0-9]*).* I think it would take care of numbers with fractional part. [^\\\\d]*([0-9]+[\\\\s]*[.,]{0,1}[\\\\s]*[0-9]*).*我觉得会处理带有小数部分的数字。 I included white spaces and included , as possible separator.我包括空格和包括,作为可能的分隔符。 I'm trying to get the numbers out of a string including floats and taking into account that the user might make a mistake and include white spaces while typing the number.我试图从包含浮点数的字符串中获取数字,并考虑到用户可能会犯错误并在键入数字时包含空格。

Sometimes you can use simple .split("REGEXP") method available in java.lang.String.有时您可以使用 java.lang.String 中提供的简单 .split("REGEXP") 方法。 For example:例如:

String input = "first,second,third";

//To retrieve 'first' 
input.split(",")[0] 
//second
input.split(",")[1]
//third
input.split(",")[2]

if you are reading from file then this can help you如果您正在从文件中读取,那么这可以帮助您

              try{
             InputStream inputStream = (InputStream) mnpMainBean.getUploadedBulk().getInputStream();
             BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
             String line;
             //Ref:03
             while ((line = br.readLine()) != null) {
                if (line.matches("[A-Z],\\d,(\\d*,){2}(\\s*\\d*\\|\\d*:)+")) {
                     String[] splitRecord = line.split(",");
                     //do something
                 }
                 else{
                     br.close();
                     //error
                     return;
                 }
             }
                br.close();

             }
         }
         catch (IOException  ioExpception){
             logger.logDebug("Exception " + ioExpception.getStackTrace());
         }
Pattern p = Pattern.compile("(\\D+)(\\d+)(.*)");
Matcher m = p.matcher("this is your number:1234 thank you");
if (m.find()) {
    String someNumberStr = m.group(2);
    int someNumberInt = Integer.parseInt(someNumberStr);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM