To create two csv-files:
echo -e "123\n456" > t0.txt
echo -e '"foo","bar"\n"foo\"bar\"","baz"' > t1.txt
Now, I want append the columns in t1.txt
to t0.txt
, so that the result becomes this:
123,"foo","bar"
456,"foo\"bar\"","baz"
First try, using csvtool
csvtool paste t0.txt t1.txt
Fatal error: exception Csv.Failure(2, 1, "Bad '"' in quoted field")
So, csvtool
doesn't seem to handle the escaped quotation mark in "foo\\"bar\\""
.
My real world use case has two CSV-files with +150.000.000 rows and 11 columns so I need a tool which can do the task without having all the data simultaneously in RAM.
Can csvtool be used with escaped quotation marks, or is there another tool that could solve this?
The final target for the CSV-file is a database in mariadb
, so an efficient import to mariadb
using t0.txt
and t1.txt
directly would be even better, but as far as I know LOAD DATA INFILE
only works on a single CSV-file.
I definitely prefer a ready-made tool, but if there is none, then some C, Perl or Python snippets would be appreciated too.
Here's a quick perl script that reads your broken CSV files, merges them, and outputs properly escaped CSV all in one pass:
#!/usr/bin/env perl
use warnings;
use strict;
use autodie;
# Install through your OS package manager or CPAN client.
# libtext-csv-xs-perl on Debian/Ubuntu and family.
use Text::CSV_XS;
open my $file0, "<", $ARGV[0];
open my $file1, "<", $ARGV[1];
my $csv = Text::CSV_XS->new({ binary => 1, escape_char => "\\",
auto_diag => 2, strict => 0});
my $out = Text::CSV_XS->new({ binary => 1 });
while ((my $row0 = $csv->getline($file0)) &&
(my $row1 = $csv->getline($file1))) {
push @$row0, @$row1;
$out->say(\*STDOUT, $row0);
}
Example:
$ perl mergecsv.pl t0.txt t1.txt
123,foo,bar
456,"foo""bar""",baz
CSV files generally escape double quotes by repetition ( ""
rather than \\"
), so your files could be considered invalid.
You could use a find and replace tool, such as sed
on Unix, to fix the escaped quotes to this more common format.
If you're looking for some other command line tool to work with CSV files, I've authored one that's available at https://github.com/pjshumphreys/querycsv
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.