简体   繁体   中英

Regex to extract the text having this patern

I have lines of text having this form

  • Introduction
  • Installation
  • 1.2 Windows Installation
  • 1.3 Linux Installation
  • 1.3.1............
  • 1.3.1.1..........

I want a REGEX to detect and extract the digit having this form: X. or XX or XXX or XX.X or X.XX...

You can use [^.][0-9.]+ as the regex.

  1. [^.] will drop any . in the beginning
  2. [0-9.]+ will match combination of digits and dots.

Demo:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String args[]) {
        String[] testStrs = { "Introduction", "1.2 Windows Installation", "1.3 Linux Installation",
                "1.3.1 ............", "1.3.1.1 .........." };
        Pattern pattern = Pattern.compile("[^.][0-9.]+");
        for (String str : testStrs) {
            Matcher matcher = pattern.matcher(str);
            if (matcher.find()) {
                System.out.println(matcher.group());
            }
        }
    }
}

Output:

1.2
1.3
1.3.1
1.3.1.1

The following expression will match all of your examples:

/(^[\d\.]+)/gm

The gm flags are required if you want to run the expression against multiple lines and get all the matches.

It should be noted that the expression will match any periods preceding or immediately following a number, so the examples below will also be matched:

.1.2 The numbers here will be matched
1.2. These numbers will also be matched

If this is a problem, I recommend removing these periods using replaceFirst() , substring() , or something similar.

in my case here is the right REGEX: [0-9.]+(\s*\w)+

implementation in java:

Pattern p = Pattern.compile("[0-9.]+(\\s*\\w)+");
Matcher m = p.matcher(str);
boolean found = m.matches();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM