简体   繁体   中英

Fixing quote escaping with bash script and sed

I have a bash file processing some CSVs. Some of the input CSVs are not formatted properly, so I want to fix them with sed. The quotes are escaped like \\" and not like "" , so I call sed to change this. In the command line this works perfectly:

sed -i 's/\\"/""/gi' input.csv

But inside a bash script this seems to do nothing. I guess it has something to do with quotes and escape sequences, but what is the solution?

you need to escape the escape character \\ for that to work:

$ echo 'bla;\"bli bli\";otherbla' | sed -e 's/\\\"/""/g'
bla;""bli bli"";otherbla

for bash scripts, you need to make sure that the line you read from the CSV file is properly quoted when passing it to sed. Can you provide an example of the CSV file as well as how you read from the file?

Using cat file | while read cat file | while read , here is an example of the problem:

$ cat test.csv
bla;\"bli bli\";otherbla
ble;""bli bli"";otherbla
bli;\"blo\";otherbla

$ cat test.sh
#!/bin/bash

cat test.csv | while read line;
do echo "$line" | sed -e 's/\\\"/""/g'
done

$ ./test.sh
bla;"bli bli";otherbla
ble;""bli bli"";otherbla
bli;"blo";otherbla

One solution is to not use echo in the script but use sed directly on the file and storing the resulting csv in a new file:

$ sed -e 's/\\\"/""/ig' test.csv > test-tmp.csv
$ cat test-tmp.csv
bla;""bli bli"";otherbla
ble;""bli bli"";otherbla
bli;""blo"";otherbla

Then, as pointed into the comments, to avoid clobbering and wrong replacements of quoted fields finishing by \\ , we can use 2 sed expressions, and include the field separator to ensure we replace only the \\" preceding or following the field separator (in my example, the field separator is ; ) but this one doesn't take into account fields single quoted with a \\ as last character in the field such as the blo line:

$ cat test.csv
bla;\"bli bli\";otherbla
ble;""bli bli"";otherbla
bli;\"blo\";otherbla
blo;"bli bli\";otherbla
blu;""bli bli\"";otherbla

$ sed -e 's/;\\\"/;""/ig' -e 's/\\\";/"";/ig' test.csv
bla;""bli bli"";otherbla
ble;""bli bli"";otherbla
bli;""blo"";otherbla
blo;"bli bli"";otherbla
blu;""bli bli\"";otherbla

If you have several sed command, you can put the in a script, it works the same way:

$ cat s.sed 
s/\\\"/""/g

Using it:

$ echo 'bla;\"bli bli\";otherbla' | sed -f s.sed 
bla;""bli bli"";otherbla

sed -f s.sed test.csv > test-tmp.csv

Have you considered the case where one of the fields legitimately ends in a \\ character? The quoted representation in the CSV file will end with a backslash followed by a quote; sed solutions such as yours and Thomas's will clobber it.

This is why sed is the wrong tool for working with quoted csv; some problems can only be solved recursively in a proper language (awk, Perl or whatever)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM