简体   繁体   中英

tools for appending column(s) to large CSV-file (merging CSV-files by column(s))

To create two csv-files:

echo -e "123\n456" > t0.txt
echo -e '"foo","bar"\n"foo\"bar\"","baz"' > t1.txt

Now, I want append the columns in t1.txt to t0.txt , so that the result becomes this:

123,"foo","bar"
456,"foo\"bar\"","baz"

First try, using csvtool

csvtool paste t0.txt t1.txt 
Fatal error: exception Csv.Failure(2, 1, "Bad '"' in quoted field")

So, csvtool doesn't seem to handle the escaped quotation mark in "foo\\"bar\\"" .

My real world use case has two CSV-files with +150.000.000 rows and 11 columns so I need a tool which can do the task without having all the data simultaneously in RAM.

Can csvtool be used with escaped quotation marks, or is there another tool that could solve this?

The final target for the CSV-file is a database in mariadb , so an efficient import to mariadb using t0.txt and t1.txt directly would be even better, but as far as I know LOAD DATA INFILE only works on a single CSV-file.

I definitely prefer a ready-made tool, but if there is none, then some C, Perl or Python snippets would be appreciated too.

Here's a quick perl script that reads your broken CSV files, merges them, and outputs properly escaped CSV all in one pass:

#!/usr/bin/env perl
use warnings;
use strict;
use autodie;
# Install through your OS package manager or CPAN client.
# libtext-csv-xs-perl on Debian/Ubuntu and family.
use Text::CSV_XS; 

open my $file0, "<", $ARGV[0];
open my $file1, "<", $ARGV[1];

my $csv = Text::CSV_XS->new({ binary => 1, escape_char => "\\",
                              auto_diag => 2, strict => 0});
my $out = Text::CSV_XS->new({ binary => 1 });

while ((my $row0 = $csv->getline($file0)) &&
       (my $row1 = $csv->getline($file1))) {
  push @$row0, @$row1;
  $out->say(\*STDOUT, $row0);
}

Example:

$ perl mergecsv.pl t0.txt t1.txt
123,foo,bar
456,"foo""bar""",baz

CSV files generally escape double quotes by repetition ( "" rather than \\" ), so your files could be considered invalid.

You could use a find and replace tool, such as sed on Unix, to fix the escaped quotes to this more common format.

If you're looking for some other command line tool to work with CSV files, I've authored one that's available at https://github.com/pjshumphreys/querycsv

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM