简体   繁体   中英

Reversable CSV parsing

Prolog newbie here. In SWI Prolog, I'm trying to figure out how to parse a simple line of CSV reversably, but I'm stuck. Here's what I've got:

csvstring1(S, L) :-
  split_string(S, ',', ',', T),
  maplist(atom_number, T, L).
   
csvstring2(S, L) :-
  atomic_list_concat(T, ',', S),
  maplist(atom_number, T, L).

% This one is the same except that maplist comes first. 
csvstring3(S, L) :-
  maplist(atom_number, T, L),
  atomic_list_concat(T, ',', S).

Now csvstring1 and csvstring2 work in a "forward" manner:

?- csvstring1('1,2,3,4', L).
L = [1, 2, 3, 4].

?- csvstring2('1,2,3,4', L).
L = [1, 2, 3, 4].

But not csvstring3:

?- csvstring3('1,2,3,4', L).
ERROR: Arguments are not sufficiently instantiated

Moreover csvstring3 works in reverse, but not the other two predicates:

?- csvstring3(L, [1,2,3,4]).
L = '1,2,3,4'.

?- csvstring1(L, [1,2,3,4]).
ERROR: Arguments are not sufficiently instantiated

?- csvstring2(L, [1,2,3,4]).
ERROR: Arguments are not sufficiently instantiated

How can I combine these into a single predicate?

I don't know of a particularly newbie friendly way to do it which doesn't compromise somewhere. This is the easiest:

csvString_list(String, List) :-
    ground(String),
    atomic_list_concat(Temp, ',', String),
    maplist(atom_number, Temp, List).

csvString_list(String, List) :-
    ground(List),
    maplist(atom_number, Temp, List),
    atomic_list_concat(Temp, ',', String).

but it makes and leaves spurious choicepoints, which is mildly annoying.

This cuts the choicepoints which is nice when using it, but poor practise to get into without being aware of what that means:

csvString_list(String, List) :-
    ground(String),
    atomic_list_concat(Temp, ',', String),
    maplist(atom_number, Temp, List),
    !.

csvString_list(String, List) :-
    ground(List),
    maplist(atom_number, Temp, List),
    atomic_list_concat(Temp, ',', String).

This uses if/else which is less code:

csvString_list(String, List) :-
  ground(String) ->
      (atomic_list_concat(Temp, ',', String), maplist(atom_number, Temp, List))
    ; (maplist(atom_number, Temp, List),      atomic_list_concat(Temp, ',', String)).

but is logically bad and you should reify the branching with if_ which isn't builtin to SWI Prolog and is less simple to use.

Or you could write a grammar with a DCG, which is not newbie territory:


:- set_prolog_flag(double_quotes, chars).
:- use_module(library(dcg/basics)).

csvTail([N|Ns]) --> [','], number(N), csvTail(Ns). 
csvTail([])     --> [].

csv([N|Ns]) --> number(N), csvTail(Ns).

eg

?- phrase(csv(Ns), "11,22,33,44,55").
Ns = [11, 22, 33, 44, 55]


?- phrase(csv([11, 22, 33, 44, 55]), String)
String = [49, 49, ',', 50, 50, ',', 51, 51, ',', 52, 52, ',', 53, 53]

but now you're back to it leaving spurious choicepoints while parsing and you have to deal with the historic split of strings/atoms/character codes in SWI Prolog; that list will unify with "11,22,33,44,55" because of the double_quotes flag but it doesn't look like it will.

split_string is not reversible. Can use DCG - here is a simple multi-line DCG parser for CSV:

% Nicer formatting
% https://www.swi-prolog.org/pldoc/man?section=flags
:- set_prolog_flag(answer_write_options, [quoted(true), portray(true), spacing(next_argument), max_depth(100), attributes(portray)]).

% Show lists of codes as text (if 3 chars or longer)
:- portray_text(true).

csv_lines([]) --> [].
% Newline after every line
csv_lines([H|T]) --> csv_fields(H), [10], csv_lines(T).

csv_fields([H|T]) --> csv_field(H), csv_field_end(T).

csv_field_end([]) --> [].
% Comma between fields
csv_field_end(T) --> [44], csv_fields(T).

csv_field([]) --> [].
csv_field([H|T]) -->
    [H],
    % Fields cannot contain comma, newline or carriage return
    { maplist(dif(H), [44, 10, 13]) },
    csv_field(T).

To demonstrate reversibility:

% Note: z is char 122
?- phrase(csv_lines([[`def`, `cool`], [`abc`, [122]]]), Lines).
Lines = `def,cool\nabc,z\n` ;
false.

?- phrase(csv_lines(Fields), `def,cool\nabc,z\n`).
Fields = [[`def`, `cool`], [`abc`, [122]]] ;
false.

To parse the field contents and maintain reversibility, can use eg atom_codes .

Others have given some advice and a lot of code. With SWI-Prolog, to parse comma-separated integers, you would use library(dcg/basics) and library(dcg/high_order) to do that trivially:

?- use_module(library(dcg/basics)),
   use_module(library(dcg/high_order)),
   portray_text(true).
true.

?- phrase(sequence(integer, ",", Ns), `1,2,3,4`).
Ns = [1, 2, 3, 4].

?- phrase(sequence(integer, ",", [-7,6,42]), S).
S = `-7,6,42`.

Of course, if you are trying to parse real CSV files, you should be using a CSV parser. Here is a minimal example of reading a CSV file and writing its output as a TSV (tab-separated) file. If this is your input in a file called example.csv :

$ cat example.csv
id,name,salary,department
1,john,2000,sales
2,Andrew,5000,finance
3,Mark,8000,hr
4,Rey,5000,marketing
5,Tan,4000,IT

You can read it from the file and write it with tabs as separators like this:

?- csv_read_file('example.csv', Data),
   csv_write_file('example.tsv', Data).
Data = [row(id, name, salary, department),
        row(1, john, 2000, sales),
        row(2, 'Andrew', 5000, finance),
        row(3, 'Mark', 8000, hr),
        row(4, 'Rey', 5000, marketing),
        row(5, 'Tan', 4000, 'IT')].

The library guesses the field separator from the filename extension. Here it correctly guessed that 'csv' means the comma "," and 'tsv' means the tab. We can make the tab explicitly visible with cat -t .

$ cat example.tsv 
id  name    salary  department
1   john    2000    sales
2   Andrew  5000    finance
3   Mark    8000    hr
4   Rey 5000    marketing
5   Tan 4000    IT
$ cat -t example.tsv 
id^Iname^Isalary^Idepartment^M
1^Ijohn^I2000^Isales^M
2^IAndrew^I5000^Ifinance^M
3^IMark^I8000^Ihr^M
4^IRey^I5000^Imarketing^M
5^ITan^I4000^IIT^M

How can I combine these into a single predicate?

csvstring(S, L) :-
  (  ground(S)
  -> atomic_list_concat(T, ',', S),
     maplist(atom_number, T, L)
  ;  maplist(atom_number, T, L),
     atomic_list_concat(T, ',', S)
  ).

... micro test...

?- csvstring('1,2,3,4', L).
L = [1, 2, 3, 4].

?- csvstring(L, [1,2,3,4]).
L = '1,2,3,4'.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM