简体   繁体   中英

How to use DCG in Prolog

So I'm currently trying to achieve something similar to this tree, using a text file containing courses and the student ID's of anyone attending it.

courses(
    [
     ('MATH2221',
      [
       201000001,
       201000002
      ]
     ),

     ('MATH2251',
      [
       201000002,
       201000003
      ]
     ),

     ('COMP2231',
      [
       201000003,
       201000001
      ]
     )
    ]
).

The text file I'm parsing from looks like this:

MATH2221
       201000001
       201000002

MATH2251
       201000002
       201000003

COMP2231
       201000003
       201000001

I read online that using DCGs is probably the best way to go about this as every student starts with a tab so ascii value '9' and then courses are separated by 2 nl characters. I'm really lost on prolog and I'm only going to post what I have that is currently working cause everything else is a mess. Does anyone have any advice at all or can at least help me understand what a DCG is?

:- debug.
:- [library(dcg/basics)].

load:-
    open('courses.txt',read,Stream),
         read,
         close(Stream).

read:-
    open('courses.txt',read,In),
    repeat,
    read_line_to_codes(In,X),write(X), nl,
    (X=end_of_file,!,
    nl); fail.

While the idea of what you ask is simple and the translation to DCG seems relatively simple, in practice it takes experience and skill to know how to do it correctly and efficiently.

The following works with SWI-Prolog (threaded, 64 bits, version 8.1.21) on Windows 10

:- [library(dcg/basics)].

courses([Course|Courses]) -->
    course(Course),
    courses(Courses), !.
courses([]) --> [].

course(course(Course,Students)) -->
    string_without("\n", Course_codes),
    { string_codes(Course,Course_codes ) },
    "\n",
    students(Students),
    (
        empty_line
    ;
        []
    ).

students([Student|Students]) -->
    student(Student),
    students(Students).
students([]) --> [].

student(Student) -->
    "\t",
    (
        (
            string_without("\n",Student_codes),
            { string_codes(Student,Student_codes) },
            "\n"
        )
    ;
        remainder(Student_codes),
        { string_codes(Student,Student_codes) }
    ).

empty_line --> "\n".

load_courses :-
    Input = "\c
MATH2221\n\c
    \t201000001\n\c
    \t201000002\n\c
    \n\c
MATH2251\n\c
    \t201000002\n\c
    \t201000003\n\c
    \n\c
COMP2231\n\c
    \t201000003\n\c
    \t201000001\c
",
    string_codes(Input,Codes),
    DCG = courses(Courses),
    phrase(DCG,Codes,Rest),
    assertion( Rest == [] ),
    format('Courses: ~n',[]),
    print_term(Courses,[]).

Example run:

?- load_courses.
Courses: 
[ course("MATH2221",["201000001","201000002"]),
  course("MATH2251",["201000002","201000003"]),
  course("COMP2231",["201000003","201000001"])
]
true.

In your example you are reading the data from a file but for this example I hard coded that data into the query so that it could be reproduced anywhere without needing to copy a file. Input makes use of \\c , see: Character Escape Syntax to keep the formatting of the input nice.

When you load the data from a file and you are not using library(dcg/basics) make use of phrase_from_file/2 or phrase_from_file/3 . When you load the data from a file and you are using library(dcg/basics) make use of read_file_to_codes/3 . Also check out open_string/2 which might be of use.

You were correct in using library(dcg/basics) but be very careful when using this as the predicates in there expect the input to be charter codes and not atoms or strings.

One predicate that is very common to use when parsing with text with DCGs is string_without//2 but as I noted it works with character codes so string_codes/2 is needed to convert the codes back into a string. Also since string_codes/2 is a standard predicate, it needs to be bookended with {} to let the DCG term rewrite code know that this is not to be translated.

When creating the example I could have added a \\n after the last student and added an extra line and made the parser very simple, but chose instead to follow the more real world convention of not adding the \\n which required adding the ; (or) parts, eg ; [] ; [] for the last missing empty line and ; remainder//1 ; remainder//1 for the missing \\n after the last student.

Since I don't know how much more you need to know about this to understand it and I don't want to write a few chapters going over the exact details of all of this, just ask questions if you have them, but I do expect you work with the code and explain why you are not understanding the code by showing examples of what you tried instead of just asking because you can.


I'm really struggling with simply just the I/O

Here is a modified version of the code which uses read_file_to_codes/3 .
Note that read_file_to_codes/3 is one of the few predicates that uses a file path/name directly and does not require the use of open/3

File : SO_question_163_courses.txt

MATH2221
       201000001
       201000002

MATH2251
       201000002
       201000003

COMP2231
       201000003
       201000001
:- [library(dcg/basics)].

courses([Course|Courses]) -->
    course(Course),
    courses(Courses), !.
courses([]) --> [].

course(course(Course,Students)) -->
    string_without("\n", Course_codes),
    { string_codes(Course,Course_codes ) },
    "\n",
    students(Students),
    (
        empty_line
    ;
        []
    ).

students([Student|Students]) -->
    student(Student),
    students(Students).
students([]) --> [].

student(Student) -->
    spaces_or_tabs_plus,
    (
        (
            string_without("\n",Student_codes),
            { string_codes(Student,Student_codes) },
            "\n"
        )
    ;
        remainder(Student_codes),
        { string_codes(Student,Student_codes) }
    ).

spaces_or_tabs_plus -->
    space_or_tab,
    spaces_or_tabs_star.

spaces_or_tabs_star -->
    space_or_tab,
    spaces_or_tabs_star.
spaces_or_tabs_star --> [].

space_or_tab -->
    (
        "\s"
    |
        "\t"
    ).

empty_line --> "\n".

example_01 :-
    Input = "\c
MATH2221\n\c
    \t201000001\n\c
    \t201000002\n\c
    \n\c
MATH2251\n\c
    \t201000002\n\c
    \t201000003\n\c
    \n\c
COMP2231\n\c
    \t201000003\n\c
    \t201000001\c
",
    string_codes(Input,Codes),
    parse_courses(Codes,Courses),
    display_courses(Courses).

example_02 :-
    File_name = "C:\\Users\\Groot\\Documents\\Projects\\Prolog\\SO_question_163_courses.txt",
    read_file_to_codes(File_name,Codes,[]),
    parse_courses(Codes,Courses),
    display_courses(Courses).

parse_courses(Codes,Courses) :-
    DCG = courses(Courses),
    phrase(DCG,Codes,Rest),
    assertion( Rest == [] ).

display_courses(Courses) :-
    format('Courses: ~n',[]),
    print_term(Courses,[]).

and some example runs

?- example_01.
Courses: 
[ course("MATH2221",["201000001","201000002"]),
  course("MATH2251",["201000002","201000003"]),
  course("COMP2231",["201000003","201000001"])
]
true.

?- example_02.
Courses: 
[ course("MATH2221",["201000001","201000002"]),
  course("MATH2251",["201000002","201000003"]),
  course("COMP2231",["201000003","201000001"])
]
true.


Note with SWI-Prolog: The string type and its double quoted syntax

When using SWI-Prolog with a version 7 or higher the meaning of double quotes and back quotes changes and Prolog DCG examples found at StackOverflow, in blogs, papers, etc., will sometimes work as presented and sometimes fail. There will seem to be no reason for this to a beginner and be very frustrating.

The way to solve this is be aware of the values for two Prolog flags :

double quotes and back quotes

double quotes will typically be one of codes,chars,atom,string
back quotes will typically be one of codes,chars,string

You will have to determine what to set them to for the code you are using by either gaining experience or just trial and error.

Also with Prolog when creating test case using

:- begin_tests(some_dcg).

:- end_tests(some_dcg).

This will create a module and since the flags scope to a module, meaning that if you have multiple modules the flag can be different in each module. So you also have to check/set the flags with the test case module.

Flags take effect from where they are to the end of the module, so if you use set_prolog_flag/2 in a module after the code you expect it to effect, it will not work, the setting of the flag has to be before the code it needs to effect. So unless you have a particular need, put the set_prolog_flag/2 directives at the top of the module.

Now to make it even more confusing, sometimes the setting in the DCG section is not the same as in the test cases, so be aware of this also.

Following is an example of a DCG that has test cases, set both flags in each module and works.

:- module(course,
      [ courses//1,
        parse_courses/2,
        display_courses/1,
        test_course/0
      ]).

test_course :-
    run_tests([course]).

:- [library(dcg/basics)].

:- set_prolog_flag(double_quotes, string).
:- set_prolog_flag(back_quotes, codes).

courses([Course|Courses]) -->
    course(Course),
    courses(Courses), !.
courses([]) --> [].

course(course(Course,Students)) -->
    string_without("\n", Course_codes),
    { string_codes(Course,Course_codes ) },
    "\n",
    students(Students),
    (
        empty_line
    ;
        []
    ).

students([Student|Students]) -->
    student(Student),
    students(Students).
students([]) --> [].

student(Student) -->
    spaces_or_tabs_plus,
    (
        (
            string_without("\n",Student_codes),
            { string_codes(Student,Student_codes) },
            "\n"
        )
    ;
        remainder(Student_codes),
        { string_codes(Student,Student_codes) }
    ).

spaces_or_tabs_plus -->
    space_or_tab,
    spaces_or_tabs_star.

spaces_or_tabs_star -->
    space_or_tab,
    spaces_or_tabs_star.
spaces_or_tabs_star --> [].

space_or_tab -->
    (
        "\s"
    |
        "\t"
    ).

empty_line --> "\n".

parse_courses(Codes,Courses) :-
    DCG = courses(Courses),
    phrase(DCG,Codes,Rest),
    assertion( Rest == [] ).

display_courses(Courses) :-
    format('Courses: ~n',[]),
    print_term(Courses,[]).

:- begin_tests(course).

:- set_prolog_flag(double_quotes, string).
:- set_prolog_flag(back_quotes, codes).

test(001) :-
    Input = "\c
        MATH2221\n\c
            \t201000001\n\c
            \t201000002\n\c
            \n\c
        MATH2251\n\c
            \t201000002\n\c
            \t201000003\n\c
            \n\c
        COMP2231\n\c
            \t201000003\n\c
            \t201000001\c
        ",
    string_codes(Input,Codes),
    parse_courses(Codes,Courses),

    assertion( Courses ==
        [
            course("MATH2221",["201000001","201000002"]),
            course("MATH2251",["201000002","201000003"]),
            course("COMP2231",["201000003","201000001"])
        ]
    ).

test(002) :-
    File_name = "C:\\Users\\Groot\\Documents\\Projects\\Prolog\\SO_question_163_courses.txt",
    read_file_to_codes(File_name,Codes,[]),
    parse_courses(Codes,Courses),

    assertion( Courses ==
        [
            course("MATH2221",["201000001","201000002"]),
            course("MATH2251",["201000002","201000003"]),
            course("COMP2231",["201000003","201000001"])
        ]
    ).

:- end_tests(course).

Running of test cases

?- run_tests.
% PL-Unit: course .. done
% All 2 tests passed
true.

or if you have multiple tests in multiple files and only need to test course

?- test_course.
% PL-Unit: course .. done
% All 2 tests passed
true.

Another thing that can be confusing is that when debugging with gtrace/0 is that a code list and string will be represented as a string with double quotes, eg "this is a string", the way to tell them apart is

  1. In the Bindings section will be the list of bound variables locate a variable and right click on it.
  2. There will be a popup dialog, select Details
  3. This will present a window with the bound value display. There are options at the top.
  4. Uncheck Portray

Example code used for following examples

dcg_test :-
    String = "string",
    Codes = [65,66,67],
    Atom = 'abc',
    dcg_test(String,Codes,Atom).

dcg_test(String,Codes,Atom) :-
    true.

Bindings

在此处输入图片说明

String example

在此处输入图片说明

Codes example

在此处输入图片说明

If you are wondering why no one tells you these things about DCGs, I just did; you should try learning this without knowing this, it took me months to realize all of this.


Notes:

I tried to do this using phrase_from_file/3 with dcg/basics , but dcg/basics expected closed list and phrase_from_file/3 creates lazy list and in massaging the code it was turning into a rewrite of the predicates in dcg/basics and dealing with end of stream issues which are some of the biggest problems when learning DCGs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM