Add a column to a csv file using a bash script

Question

I would like to append a column to a csv file using a bash script given a condition. The condition is that the column in file1.csv must have more than one unique value to be added to newfile.csv. These are not the real files. The original file has a lot more columns/rows.

Something like this:

file1.csv

1, ah, th, ab, a
2, ah, jk, ab, b
3, ah, lk, ab, c
4, ah, hh, ab, d

newfile.csv should be:

1, th, a
2, jk, b
3, lk, c
4, hh, d

This is the script I tried. However, it does not append the new columns. The output is just a csv with the last column of file1.csv that had more than one unique value.

#!/bin/bash
cut -d, -f1 file1.csv > newfile.csv
limit=1
for i in $(seq 2 5); do
   value=$(cat file1.csv | cut -d, -f$i | uniq -u | wc -l)
   if [ $value -gt $limit ]; then
        paste -d, newfile.csv <(cut -d, -f$i file1.csv) > newfile.csv
   else echo "Column $i not appended."
   fi
done

I suspect it may have something to do with the fact I have newfile.csv twice in one line. I tried creating a new file newfile2.csv for each interaction, but that did not work. I am new to Bash.

Answer 1

You may use this 2 phase awk solution:

awk 'BEGIN {FS=OFS=", "} FNR==NR {for (i=1; i<=NF; ++i) if (!seen[i,$i]++) ++fq[i]; next} {s=""; for (i=1; i<=NF; ++i) if (fq[i] > 1) s = (s == "" ? "" : s OFS ) $i; print s}' file{,}

1, th, a
2, jk, b
3, lk, c
4, hh, d

Expanded form:

awk 'BEGIN {
   FS = OFS = ", "
}
FNR == NR {
   for (i=1; i<=NF; ++i)
      if (!seen[i,$i]++)
         ++fq[i]
      next
}
{
   s = ""
   for (i=1; i<=NF; ++i)
      if (fq[i] > 1)
         s = (s == "" ? "" : s OFS ) $i
   print s
}' file{,}

Answer 2

another similar awk with double scanning the file

$ awk -v F', ' 'NR==FNR {for(i=1;c[i]<2 && i<=NF;i++) if(!f[i,$i]++) c[i]++; next}
                FNR==1  {for(i=1;i<=NF;i++) if(c[i]>1) a[++k]=i}
                        {for(i=1;i<=k;i++) printf "%s%s",$(a[i]),i==k?ORS:FS}' file{,}

1, th, a
2, jk, b
3, lk, c
4, hh, d

short circuits columns already has more than one unique value, and while printing only scans the non-unique columns

The file{,} notation means file file , to provide the input file twice due to the double scanning algorithm.

Answer 3

Using any awk in any shell on every Unix box, this will work efficiently and use minimal memory:

$ cat tst.awk
BEGIN { FS=OFS=", " }
NR==FNR {
    if ( NR == 1 ) {
        split($0,uniq)
    }
    for (inFldNr in uniq) {
        if ( seen[inFldNr,$inFldNr]++ ) {
            delete seen[inFldNr,$inFldNr]
            delete uniq[inFldNr]
        }
    }
    next
}
FNR==1 {
    for (inFldNr=1; inFldNr<=NF; inFldNr++) {
        if (inFldNr in uniq) {
            out2inFldNr[++numOutFlds] = inFldNr
        }
    }
}
{
    for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
        inFldNr = out2inFldNr[outFldNr]
        printf "%s%s", $inFldNr, (outFldNr<numOutFlds ? OFS : ORS)
    }
}

$ awk -f tst.awk file1.csv file1.csv
1, th, a
2, jk, b
3, lk, c
4, hh, d

Answer 4

Problem solved with renaming the file inside the script:

#!/bin/bash
cut -d, -f1 file1.csv > newfile.csv
limit=1
for i in $(seq 2 5); do
   value=$(cat file1.csv | cut -d, -f$i | uniq -u | wc -l)
   if [ $value -gt $limit ]; then
        cut -d, -f$i file.csv > column.csv
        paste -d, newfile.csv column.csv > newfile2.csv
        cp newfile2.csv newfile.csv
   else echo "Column $i not appended."
   fi
done

Add a column to a csv file using a bash script

Question

4 answers

solution1
2 2021-03-06 11:23:04

solution2
2 2021-03-06 15:53:28

solution3
2 2021-03-07 00:14:25

solution4
0 2021-03-09 07:59:13

Add a column to a csv file using a bash script

Question

4 answers

solution1 2 2021-03-06 11:23:04

solution2 2 2021-03-06 15:53:28

solution3 2 2021-03-07 00:14:25

solution4 0 2021-03-09 07:59:13

solution1
2 2021-03-06 11:23:04

solution2
2 2021-03-06 15:53:28

solution3
2 2021-03-07 00:14:25

solution4
0 2021-03-09 07:59:13