简体   繁体   中英

parsing with CHR (Constraint handling rules)

Trying to do sort of parsing with CHR (Constraint handling rules) I come up with this (still learning)

w(the),w(X) <=> w([the,X]).
w(L),w(on),w(R) <=> w(on(L,R)).
w(L),w(is),w(R) <=> w(is(L,R)).

here is what i got:

?- w(the),w(box),w(is),w(on),w(the),w(table).
w(table),
w([the, on(is, [the, box])]).

should be instead:

is([the,box],on([the,table]))

to make it work I have to figure how to do sort of late-binding on "on" and later "is", so the the the-X has been already "collected"

Second question is how to write CHR rules so that they parse a list, instead of making w(..) goals.

PS> also any info on mixing DCG with CHR would be welcome. Cant find anything !!

I don't think this is necessarily a good idea. I especially don't think it will be easier than using DCGs. You still need to write some sort of proper grammar that knows more about the input's structure than just "combine certain sequences of words". That said, there is some work on parsing with CHR, see CHR Grammars for example.

Your first problem is that CHR constraints are unordered. When you try to match w(L), w(on), w(R) , you have no guarantee that L is actually to the left of R in the input. This information is not recorded for you.

So what you can do is to record it yourself. You can replace each w(X) with a w(X, Start, End) constraint where Start and End are some sort of input position markers. For example, simply numbering tokens from left to right:

:- use_module(library(chr)).

:- chr_constraint w/3.

parse(Tokens) :-
    parse(Tokens, 0).

parse([], _Position).
parse([Token | Tokens], Position) :-
    NextPosition is Position + 1,
    w(Token, Position, NextPosition),
    parse(Tokens, NextPosition).

You can use this as follows:

?- parse([the, box, is, on, the, table]).
w(table, 5, 6),
w(the, 4, 5),
w(on, 3, 4),
w(is, 2, 3),
w(box, 1, 2),
w(the, 0, 1).

Given this format, you can enhance your matching rules to only combine adjacent inputs (where each component's end marker is the same as the "next" component's start marker):

w(the, S, M), w(X, M, E) <=> w([the,X], S, E).
w(L, S, M1), w(on, M1, M2), w(R, M2, E) <=> w(on(L,R), S, E).
w(L, S, M1), w(is, M1, M2), w(R, M2, E) <=> w(is_(L,R), S, E).

(I use is_ instead of is because is is a built-in operator, and is(X, Y) facts would be printed as X is Y , which is confusing in what follows.)

This allows you to make some progress:

?- parse([the, box, is, on, the, table]).
w([the, table], 4, 6),
w(is_([the, box], on), 0, 4).

We see that the and table at positions 4 to 5 and 5 to 6 were merged together into a phrase the, table at position 4 to 6. Also the words the , box , is , on , were merged together into one object covering positions 0 to 4. But this is not correct: "the box is on" is not a valid phrase. The phrase to the right of "is" must be a location or other situation. You need more information about the kinds of sub-phrases in a phrase. You need proper grammar information, the kind that you would encode in a DCG by defining rules with different names.

Here is an expanded version:

:- chr_constraint noun_phrase/3, situation/3, sentence/3.

w(the, S, M), w(X, M, E) <=>
    noun_phrase([the,X], S, E).
w(on, S, M), noun_phrase(R, M, E) <=>
    situation(on(R), S, E).
noun_phrase(L, S, M1), w(is, M1, M2), situation(R, M2, E) <=>
    sentence(is_(L,R), S, E).

And this works:

?- parse([the, box, is, on, the, table]).
sentence(is_([the, box], on([the, table])), 0, 6).

But at this point you are writing a DCG in ugly syntax and with lots of extra information you must track by hand. Why not write the DCG instead?

noun_phrase([the, X]) -->
    [the],
    [X].

situation(on(Place)) -->
    [on],
    noun_phrase(Place).

sentence(is_(Thing, Where)) -->
    noun_phrase(Thing),
    [is],
    situation(Where).

This works just as nicely:

?- phrase(sentence(Sentence), [the, box, is, on, the, table]).
Sentence = is_([the, box], on([the, table])).

If you want to get this information into a CHR constraint, you can do that by calling that constraint. For example, with the DCG above:

:- chr_constraint sentence/1.

parse(Tokens) :-
    phrase(sentence(Sentence), Tokens),
    sentence(Sentence).

This gets the same CHR result as above, but without the position markers which are no longer needed:

?- parse([the, box, is, on, the, table]).
sentence(is_([the, box], on([the, table]))).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM