How can I write a sed script (or awk, just not familiar with it) to stip commas from the innards of a double-quoted csv field? I can remove a single comma using the following sed one-liner:
sed 's/"\([^"]*\),\([^"]*\)"/\1\2/g' file > file2
But if I have two commas in the field only has one comma stripped:
"ALOHA, INC., A CONDOMINIUM ASSOCIATION"
becomes
"ALOHA, INC. A CONDOMINIUM ASSOCIATION"
Alternatively, if someone can explain to me why I can't seem to get the 'OPTIONALLY ENCLOSED BY ' " ' option to work when loading csv's into mysql, that would make life a hell of a lot easier (I've been trying to use sed to strip the commas because commas destroy my columnar data even when I use the optionally enclosed option and my fields are double quoted. Excel exports with quotes only around fields with commas. If everything is double-quoted, I don't have a problem, but with selectively-quoted, I start banging my shoe on the desk.
Update: The file includes multiple fields.
"ALOHA, INC., A CONDOMINIUM ASSOCIATION", 900, VENICE, FL, 34293-5112,,VENICE,FL,34285,ALOHA
I'm even concerned there might be rows that have multiple quoted fields, which seems like it could be a serious problem. As far as I can tell it's not that common, though.
One thing I was just thinking is I could eliminate all instances of ', INC' but that wouldn't eliminate other examples, like ', LLC', etc.
I want to remove all commas from within a field.
I'm worried about cases such as:
"ALOHA, INC., A CONDOMINIUM ASSOCIATION", 900, VENICE, FL, 34293-5112,,VENICE,FL,34285,"ALOHA, Inc., A CONDOMINIUM ASSOCIATION"
Wouldn't the commas between the first instance of Aloha and the last instance be eliminated with
sed 's/"\([^"]*\),\([^"]*\)"/\1\2/g' file > file2
If you want to remove all occurrences of commas between "
to "
then following may help you.
sed '/".*"/s/,//g' Input_file
Use sed -i
option in case you want to save output into Input_file itself.
You can use GNU awk for this case
$ gawk -v FPAT='"[^"]*"|[^,]*' -v OFS=, '{for(i=1; i<=NF; i++) gsub(/,/, "", $i)} 1' ip.txt
"ALOHA INC. A CONDOMINIUM ASSOCIATION", 900, VENICE, FL, 34293-5112,,VENICE,FL,34285,"ALOHA Inc. A CONDOMINIUM ASSOCIATION"
-v FPAT='"[^"]*"|[^,]*'
define input field as double quoted fields, or those separated by comma
-v OFS=,
comma as output field separator for(i=1; i<=NF; i++)
loop over all input fields
gsub(/,/, "", $i)
delete all commas 1
print contents of $0
If gawk
is not available, you can use
perl -pe 's/"[^"]+"/$&=~tr|,||dr/ge' ip.txt
Note: This won't work if a field contains double quote, newline, etc. Use csv
parsers available in perl
, python
, etc
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.