简体   繁体   中英

Using regex to check if start of a string matches a pattern

I'm reading lines in from a .txt file, and I need to check if each line is 'valid'.

A valid line starts with a number between -2 and 2 inclusive, and is then followed by a single whitespace, and then potentially text.

  • "-1 this is valid."
  • "0 so is this"
  • "-2 and this"
  • "2 and this"
  • "-1this isn't."
  • "-23 nor this"

I want to use regex for this, but am having trouble getting it working. I am quite unfamiliar with regex. Here's my code:

public static List<Sentence> readFile(String filename) {
        List<Sentence> sentences = new LinkedList<>();

        Pattern pattern = Pattern.compile("-[0-2] abc");
        Matcher matcher;
        try (BufferedReader br = new BufferedReader(new FileReader(filename))) {

            while (br.ready()){
                matcher = pattern.matcher(br.readLine());
                if (matcher.matches()){
                    System.out.print("matches ");
                }
            }

        } catch (IOException e){
            e.printStackTrace();
        }

        return sentences;

    }

This isn't working (no suprise). Could someone help me out getting the correct regex expression?

You might be looking for a regex similar to the following:

^(0|-?[1-2]) .*

The ^ symbol matches the beginning of a line, the (0|...) matches 0 or the following expression, the -? matches 0 or 1 occurrence of - , [1-2] matches 1, or 2, matches a whitespace, and .* matches 0 or more of anything but a newline.

If I am not getting your question wrong, you will need something like this as your pattern:

-?[0-2]\\s[\\w\\d\\s]*

Try RegExr, it is a very good web based tool for figuring out regex patterns.

if test strings are separated line by line, then you can validate test strings line by line by

foreach (string line in lines)
{
    match = Regex.Match(line, @"^(-?[1-2]\s.*|0\s.*)", RegexOptions.IgnoreCase);
    if (match.Success)
    {
        MessageBox.Show(match.Groups[1].Value);
    }
}

It validate the test string, and capture the valid string.

As mentioned it works only if the test strings are separated by line.

To modify the regular expression to cater a full string separated by "\\n"

It should be

string regExp = @"(-?[1-2]\s.+[\n]{1}|(?<!-)0\s.+[\n]{1})";
MatchCollection matches = Regex.Matches(longstr, regExp, RegexOptions.IgnoreCase);
foreach(Match match in matches)
{
    if (match.Success)
    {
        MessageBox.Show(match.Groups[1].Value);
    }
}

The concern about full string is, you can no longer apply ^ or & in the expression.

Negative match will be occurred and capture the substring " 0 is not valid " from " -0 is not valid " if ^ is removed from the expression

Thus (?<!-) is required to it to ignore - as the first char when the following char is 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM