简体   繁体   English

Java使用正则表达式匹配测验模式

[英]Java using regex to match a pattern for quizzes

I am trying to do one of the 100 mega list projects. 我正在尝试做100个大型项目中的一个。 One of them is about a quiz maker that parses through a file of quiz questions, picks some of them out at random, creates a quiz and also grades quizzes. 其中一个是关于一个测验制作者,它通过一个测验问题的档案进行解析,随机选择其中一些,创建一个测验,并对测验进行评分。

I am trying to do the part of simply loading in the quiz questions and parsing them out individually (ie 1 question and its multiple choice answers as an entity). 我试图简单地加载测验问题并单独解析它们(即1个问题及其多选答案作为实体)。

The format of the quiz is as follows: 测验的格式如下:

Intro to Computer Science


    1. Which of the following accesses a variable in structure b?
    A. b->var
    B. b.var
    C. b-var
    D. b>var

    2. Which of the following accesses a variable in a pointer to a structure, *b?
    A. b->var
    B. b.var
    C. b-var
    D. b>var

    3. Which of the following is a properly defined struct?
    A. struct {int a;}
    B. struct a_struct {int a;}
    C. struct a_struct int a
    D. struct a_struct {int a;}

    4. Which properly declares a variable of struct foo?
    A. struct foo
    B. foo var
    C. foo
    D. int foo

Of course there are many of these questions but they are all in the same format.Now I used BufferedReader to load in these questions into a string and am attempting to use regex to parse them. 当然有很多这些问题,但它们都是相同的格式。现在我使用BufferedReader将这些问题加载到一个字符串中,并尝试使用正则表达式来解析它们。 But I am unable to match on any specific part. 但我无法匹配任何具体部分。 Below is my code: 以下是我的代码:

    package myPackage;
    import java.io.*;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;

public class QuizMaker {

    public static void main(String args[])
    {


        String file = "myfile/QuizQuestions.txt";
        StringBuilder quizLine = new StringBuilder();
        String line = null;

        try {
            FileReader reader = new FileReader(file);

            BufferedReader buffreader = new BufferedReader(reader);



            while ((line = buffreader.readLine()) != null)
            {
                quizLine.append(line);
                quizLine.append("\n");
            }

            buffreader.close();

        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
          catch (IOException e1) {

              e1.printStackTrace();
        }


        System.out.println(quizLine.toString());


        Pattern pattern = Pattern.compile("^[0-9]{1}.+\\?");
        Matcher matcher = pattern.matcher(quizLine.toString());

        boolean didmatch = matcher.lookingAt();
        System.out.println(didmatch);

        String mystring = quizLine.toString();

        int start = matcher.start();
        int end = matcher.end();

        System.out.println(start + " " + end);

        char a = mystring.charAt(0);
        char b = mystring.charAt(6);

        System.out.println(a + " " + b);



    }



}

At this point, I am simply trying to match on the questions themselves and leave the multiple choice answers till I solve this part. 在这一点上,我只是试图在问题本身上进行匹配并留下多项选择答案,直到我解决这一部分。 Is it due to my regex pattern being wrong? 是因为我的正则表达式模式错了吗? I tried to even match on a simple number itself and even that was failing (via "^[0-9]{1}"). 我试着甚至匹配一个简单的数字本身甚至是失败的(通过“^ [0-9] {1}”)。

Am I doing something completely wrong? 我做错了什么吗? One other question I had was that this simply was returning one match, not all of them. 我遇到的另一个问题是,这只是返回一场比赛,而不是所有比赛。 How exactly would you iterate through the string to find all matches? 你究竟如何遍历字符串以查找所有匹配项? Any help would be appreciated. 任何帮助,将不胜感激。

I personally wouldn't use a regex, I would just use a StringTokenizer on the \\n, and just check if the first character is a numeric (since no other lines seem to start with a number). 我个人不会使用正则表达式,我只会在\\ n上使用StringTokenizer,并检查第一个字符是否为数字(因为没有其他行似乎以数字开头)。

But to more specifically answer your question. 但更具体地回答你的问题。 You need to specify the MULTILINE flag on your pattern for ^ and $ to match the start and end of lines. 您需要在模式上为^和$指定MULTILINE标志以匹配行的开头和结尾。

Pattern pattern = Pattern.compile("^[0-9]{1}.+\\?", Pattern.MULTILINE);

This should allow your pattern to match lines within the text. 这应该允许您的模式匹配文本中的行。 Otherwise ^ and $ just match the start and end of the string as a whole. 否则^和$只匹配字符串的开头和结尾。

Description 描述

This expression will capture the entire question followed by all the possible answers providing the string is roughly formatted like your sample text 此表达式将捕获整个问题,然后是所有可能的答案,前提是字符串的大致格式与示例文本类似

^\\s*(\\d+\\.\\s+.*?)(?=[\\r\\n]+^\\s*\\d+\\.|\\Z)

在此输入图像描述

Example

Live Example: http://www.rubular.com/r/dcetgPsz5w 实例: http//www.rubular.com/r/dcetgPsz5w

Given Sample Text 给出示例文本

Intro to Computer Science


    1. Which of the following accesses a variable in structure b?
    A. b->var
    B. b.var
    C. b-var
    D. b>var

    2. Which of the following accesses a variable in a pointer to a structure, *b?
    A. b->var
    B. b.var
    C. b-var
    D. b>var



    3. Which of the following is a properly defined struct?
    A. struct {int a;}
    B. struct a_struct {int a;}
    C. struct a_struct int a
    D. struct a_struct {int a;}

    4. Which properly declares a variable of struct foo?
    A. struct foo
    B. foo var
    C. foo
    D. int foo

Capture Group 1 Matches 捕获组1匹配

[0] => 1. Which of the following accesses a variable in structure b?
A. b->var
B. b.var
C. b-var
D. b>var
[1] => 2. Which of the following accesses a variable in a pointer to a structure, *b?
A. b->var
B. b.var
C. b-var
D. b>var
[2] => 3. Which of the following is a properly defined struct?
A. struct {int a;}
B. struct a_struct {int a;}
C. struct a_struct int a
D. struct a_struct {int a;}
[3] => 4. Which properly declares a variable of struct foo?
A. struct foo
B. foo var
C. foo
D. int foo

If you yse String.matches() , you need only a fraction of the code you are cutrently attempting to use. 如果你使用String.matches() ,你只需要一小部分你正在尝试使用的代码。

To test if a line is a question: 要测试一行是否是一个问题:

if (line.matches("\\s*\\d\\..*"))

To test if a line is an answer: 要测试一条线是否是答案:

if (line.matches("\\s*[A-Z]\\..*"))
  1. In the code, quizLine is like "1. Which of the following accesses a variable in structure b?\\nA. b->var\\nB. b.var\\n...". 在代码中,quizLine类似于“1.以下哪一个访问结构b中的变量?\\ nA.b-> var \\ nB.b.var \\ n ...”。 The pattern "^[0-9]{1}.+\\?" 模式“^ [0-9] {1}。+ \\?” will try to match the whole string, which is not correct. 将尝试匹配整个字符串,这是不正确的。
  2. The simple way to do that is quizLine.split and the match it line by line 这样做的简单方法是quizLine.split,并逐行匹配
  3. Another way is as @Denomales and @Chase described, use multiple line match, and get match groups. 另一种方法是@Denomales和@Chase描述,使用多行匹配,并获得匹配组。
  4. As @Bohemian said, String#matches is a good shortcut to check if string matches, but could not get match groups. 正如@Bohemian所说,String#matches是检查字符串是否匹配但是无法获得匹配组的好快捷方式。 If you need Matcher, be noted that Matcher#lookingAt is a little different from Matcher#matches. 如果你需要Matcher,请注意Matcher #lookingAt与Matcher#matches有点不同。 Matcher#matches may be better in your case. 匹配#匹配可能会更好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM