简体   繁体   中英

sed and regex to replace ',' except inside a string

I have an input of the following schema

10,0,'string1_string2,_string3','',8,0,0,0.59,'20140101205216','20140128074836',584266915,5934

and I would like to replace all comma " , " characters with tabs using sed. The constraint is to not replace " , " inside text strings (ie the comma in 'string1_string2,_string3' should not be replaced with tab). A regex to do this is ,(?!,_).

However the following sed does not work. I've tried all escaping permutations too.

sed s/",\(\?\!,_\)"/"\t"/g 

Is there a way to do this?

On Mac OS X 10.9.1, you can use:

sed -E -e "s/('[^']*'|[^,]*),/\1X/g"

except that you'd replace the X with an actual tab. For your input line, that yields:

10X0X'string1_string2,_string3'X''X8X0X0X0.59X'20140101205216'X'20140128074836'X584266915X5934

which has X's where you want tabs. With GNU sed , you can use -r in place of -E (though it also recognizes -E ). Mac sed will not expand \\t to a tab; GNU sed will. With Bash, you can use the ANSI-C Quoting mechanism to have the shell embed a tab in the string passed to sed :

sed -E -e "s/('[^']*'|[^,]*),/\1"$'\t'"/g"

Without the extended regular expressions (activated by -r or -E ), it isn't worth trying in sed ; use awk instead.

The regex looks for either a single quote followed by zero or more non-quotes and a single quote or zero or more non-commas, followed by a comma, and replaces it with what was remembered as the either/or string and a 'tab' (using X to represent tab because it is more visible).


devnull points out that the answer above replaces the comma in a string at the end of a line. There's a workaround for that:

sed -E -e "s/('[^']*'|[^,]*)(,|$)/\1"$'\t'"/g; s/"$'\t'"$//"

The s///g before the semicolon adds a tab to the end of each line; the s/// after the semicolon removes the tab that was just added.

I would suggest take Perl's help if available because of availability of lookarounds :

s="10,0,'string1_string2,_string3','',8,0,0,0.59,'20140101205216','20140128074836',584266915,5934"

perl -pe "s/,(?=(([^']*'){2})*[^']*$)/\t/g" <<< "$s"

10\t0\t'string1_string2,_string3'\t''\t8\t0\t0\t0.59\t'20140101205216'\t'20140128074836'\t584266915\t5934

PS: Showing \\t only for readability purpose.

You could use Text::ParseWords :

perl -MText::ParseWords -n -l -e 'print join("\t", parse_line(",", 1, $_));' filename

For your input, it'd result in:

10      0       'string1_string2,_string3'      ''      8       0       0       0.59    '20140101205216'        '20140128074836'        584266915       5934

This seems to work if I understand your question correctly:

sed -E 's/,([^_])/\t\1/g'

Output:

10  0   'string1_string2,_string3'  ''  8   0   0   0.59    '20140101205216'    '20140128074836'    584266915   5934

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM