How to separate csv columns by awk, with a comma being the field separator?

Question

My regex didn't work in a csv file with awk on its command line field separator .

My csv is separated by commas ( , ) but some fields has commas inside itself too.

The data.csv is like:

t1,t2,t3,t4
field without comma,f02,f03,f04
field, with comma,f12,f13,f14
field without comma,f22,f23,f24
field without comma,f22,f23,f34

If we see in field, with comma,f12,f13,f14 , we have two kinds of commas:

comma is part of the data (inside in the field), like field, with comma , and;
comma is separating fields ,f12,f13,f14 .

So I tried awk, with -F and regex :

awk -F'/\B\,/\B/' '!seen[$2]++' data.csv > resulted.csv

My strategy was: the field separator needs to be a comma \\, in No-Word-Boundary \\B .

So, my command didn't outputted the resulted.csv . But outputted a warning:

gawk: warning: escape sequence `\B' treated as plain `B'
gawk: warning: escape sequence `\,' treated as plain `,'

And the desired result.csv will remove repeated lines, like:

t1,t2,t3,t4
field without comma,f02,f03,f04
field, with comma,f12,f13,f14
field without comma,f22,f23,f24

Answer 1

Without GNU awk, with your data, you can use gsub to replace the ", " string with some non-conflicting characters such as "__" separate the fields as normal on "," and then restore the comma within the field (eg ", " ) using gsub again. For example:

 awk -F, -v OFS=, '
    { gsub(/, /,"__"); for (i = 1; i <= NF; i++) gsub(/__/,", ", $i) }
    !seen[$0]++
' file.csv

Above gsub(/, /,"__") replaces all occurrences of ", " with two-underscores in the input record. Then looping over each field, any "__" is replaced with ", " restoring the original comma in the field.

Example Use/Output

Given your data, the above results in:

$ awk -F, -v OFS=, '
>     { gsub(/, /,"__"); for (i = 1; i <= NF; i++) gsub(/__/,", ", $i) }
>     !seen[$0]++
> ' file.csv
t1,t2,t3,t4
field without comma,f02,f03,f04
field, with comma,f12,f13,f14
field without comma,f22,f23,f24

Answer 2

With GNU awk:

awk -F ',[^ ]' '!seen[$2]++' data.csv

Output:

t1,t2,t3,t4
field without comma,f02,f03,f04
field, with comma,f12,f13,f14
field without comma,f22,f23,f24

Answer 3

If the intent is to use the t2 column as a key value then this is how you'd do it:

$ awk -F, '!seen[$(NF-2)]++' data.csv
t1,t2,t3,t4
field without comma,f02,f03,f04
field, with comma,f12,f13,f14
field without comma,f22,f23,f24

If it's to use the t1 column as the key instead then this is how you'd do that:

$ awk '{key=$0; sub(/(,[^,]+){3}$/,"",key)} !seen[key]++' data.csv
t1,t2,t3,t4
field without comma,f02,f03,f04
field, with comma,f12,f13,f14

If it's something else then please clarify your question and update the example.

How to separate csv columns by awk, with a comma being the field separator?

Question

3 answers

solution1
2 2019-08-16 00:29:23

solution2
1 ACCPTED 2019-08-15 22:37:25

solution3
1 2019-08-16 01:21:31

How to separate csv columns by awk, with a comma being the field separator?

Question

3 answers

solution1 2 2019-08-16 00:29:23

solution2 1 ACCPTED 2019-08-15 22:37:25

solution3 1 2019-08-16 01:21:31

solution1
2 2019-08-16 00:29:23

solution2
1 ACCPTED 2019-08-15 22:37:25

solution3
1 2019-08-16 01:21:31