简体   繁体   中英

How to pad CSV file missing columns

I have a problem with some CSV files comming from a soft and that I want to use to make PostgreSQL import (function COPY FROM CSV). The problem is that some last columns are missing like this (letter for headers, number for values, _ for the TAB delimiter):

a_b_c_d
1_2_3_4
5_6_7       <- last column missing
8_9_0_1
2_6_7       <- last column missing

COPY in_my_table FROM file.csv result is : ERROR: missing data for column "d"

Sample of a correct file for import :

a_b_c_d
1_2_3_4
5_6_7_       <- null column but not missing
8_9_0_1
2_6_7_       <- null column but not missing

My question : is there some commands in bash / linux shell to add the TAB delimiter to make a correct / comlete / padded csv file with all columns.

Thanks for help.

Ok, so in fact I found this:

awk -F'\t' -v OFS='\t' 'NF=50' input.csv > output.csv 

where 50 is the number of TAB + 1.

对linux不太了解,但是可以在postgresql中通过简单的命令轻松完成

copy tableName from '/filepath/name.csv' delimiter '_' csv WITH NULL AS 'null';

You can use a combination of sed and regular expressions:

sed -r 's/^[0-9](_[0-9]){2}$/\0_/g' file.csv

You only need to replace _ by your delimiter ( \\t ).

Awk is good for this.

awk -F"\t" '{     # Tell awk we are working with tabs
if ($4 =="")      # If the last field is empty
    print $0"\t"  # print the whole line with a tab
else
    print $0      # Otherwise just print the line
}' your.csv  > your.fixed.csv 

Perl has a CSV module, which might be handy to fix even more complicated CSV errors. On my Ubuntu test system it is part of the package libtext-csv-perl .

This fixes your problem:

#! /usr/bin/perl
use strict;
use warnings;
use Text::CSV;

my $csv = Text::CSV->new ({ binary => 1, eol => $/, sep_char => '_' });

open my $broken, '<', 'broken.csv';
open my $fixed, '>', 'fixed.csv';

while (my $row = $csv->getline ($broken)) {
  $#{$row} = 3;
  $csv->print ($fixed, $row);
}

Change sep_char to "\\t" , if you have a tabulator delimited file and keep in mind that Perl treats "\\t" and '\\t' differently.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM