I have an input of the following schema
10,0,'string1_string2,_string3','',8,0,0,0.59,'20140101205216','20140128074836',584266915,5934
and I would like to replace all comma " ,
" characters with tabs using sed. The constraint is to not replace " ,
" inside text strings (ie the comma in 'string1_string2,_string3'
should not be replaced with tab). A regex to do this is ,(?!,_).
However the following sed does not work. I've tried all escaping permutations too.
sed s/",\(\?\!,_\)"/"\t"/g
Is there a way to do this?
On Mac OS X 10.9.1, you can use:
sed -E -e "s/('[^']*'|[^,]*),/\1X/g"
except that you'd replace the X with an actual tab. For your input line, that yields:
10X0X'string1_string2,_string3'X''X8X0X0X0.59X'20140101205216'X'20140128074836'X584266915X5934
which has X's where you want tabs. With GNU sed
, you can use -r
in place of -E
(though it also recognizes -E
). Mac sed
will not expand \\t
to a tab; GNU sed
will. With Bash, you can use the ANSI-C Quoting mechanism to have the shell embed a tab in the string passed to sed
:
sed -E -e "s/('[^']*'|[^,]*),/\1"$'\t'"/g"
Without the extended regular expressions (activated by -r
or -E
), it isn't worth trying in sed
; use awk
instead.
The regex looks for either a single quote followed by zero or more non-quotes and a single quote or zero or more non-commas, followed by a comma, and replaces it with what was remembered as the either/or string and a 'tab' (using X to represent tab because it is more visible).
devnull points out that the answer above replaces the comma in a string at the end of a line. There's a workaround for that:
sed -E -e "s/('[^']*'|[^,]*)(,|$)/\1"$'\t'"/g; s/"$'\t'"$//"
The s///g
before the semicolon adds a tab to the end of each line; the s///
after the semicolon removes the tab that was just added.
I would suggest take Perl's help if available because of availability of lookarounds :
s="10,0,'string1_string2,_string3','',8,0,0,0.59,'20140101205216','20140128074836',584266915,5934"
perl -pe "s/,(?=(([^']*'){2})*[^']*$)/\t/g" <<< "$s"
10\t0\t'string1_string2,_string3'\t''\t8\t0\t0\t0.59\t'20140101205216'\t'20140128074836'\t584266915\t5934
PS: Showing \\t
only for readability purpose.
You could use Text::ParseWords
:
perl -MText::ParseWords -n -l -e 'print join("\t", parse_line(",", 1, $_));' filename
For your input, it'd result in:
10 0 'string1_string2,_string3' '' 8 0 0 0.59 '20140101205216' '20140128074836' 584266915 5934
This seems to work if I understand your question correctly:
sed -E 's/,([^_])/\t\1/g'
Output:
10 0 'string1_string2,_string3' '' 8 0 0 0.59 '20140101205216' '20140128074836' 584266915 5934
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.