简体   繁体   中英

Strip variable number of commas from double-quoted csv field

How can I write a sed script (or awk, just not familiar with it) to stip commas from the innards of a double-quoted csv field? I can remove a single comma using the following sed one-liner:

sed 's/"\([^"]*\),\([^"]*\)"/\1\2/g' file > file2

But if I have two commas in the field only has one comma stripped:

"ALOHA, INC., A CONDOMINIUM ASSOCIATION"

becomes

"ALOHA, INC. A CONDOMINIUM ASSOCIATION"

Alternatively, if someone can explain to me why I can't seem to get the 'OPTIONALLY ENCLOSED BY ' " ' option to work when loading csv's into mysql, that would make life a hell of a lot easier (I've been trying to use sed to strip the commas because commas destroy my columnar data even when I use the optionally enclosed option and my fields are double quoted. Excel exports with quotes only around fields with commas. If everything is double-quoted, I don't have a problem, but with selectively-quoted, I start banging my shoe on the desk.

Update: The file includes multiple fields.

"ALOHA, INC., A CONDOMINIUM ASSOCIATION", 900, VENICE, FL, 34293-5112,,VENICE,FL,34285,ALOHA

I'm even concerned there might be rows that have multiple quoted fields, which seems like it could be a serious problem. As far as I can tell it's not that common, though.

One thing I was just thinking is I could eliminate all instances of ', INC' but that wouldn't eliminate other examples, like ', LLC', etc.

I want to remove all commas from within a field.

I'm worried about cases such as:

"ALOHA, INC., A CONDOMINIUM ASSOCIATION", 900, VENICE, FL, 34293-5112,,VENICE,FL,34285,"ALOHA, Inc., A CONDOMINIUM ASSOCIATION"

Wouldn't the commas between the first instance of Aloha and the last instance be eliminated with

sed 's/"\([^"]*\),\([^"]*\)"/\1\2/g' file > file2

If you want to remove all occurrences of commas between " to " then following may help you.

sed '/".*"/s/,//g'   Input_file

Use sed -i option in case you want to save output into Input_file itself.

You can use GNU awk for this case

$ gawk -v FPAT='"[^"]*"|[^,]*' -v OFS=, '{for(i=1; i<=NF; i++) gsub(/,/, "", $i)} 1' ip.txt
"ALOHA INC. A CONDOMINIUM ASSOCIATION", 900, VENICE, FL, 34293-5112,,VENICE,FL,34285,"ALOHA Inc. A CONDOMINIUM ASSOCIATION"

If gawk is not available, you can use

perl -pe 's/"[^"]+"/$&=~tr|,||dr/ge' ip.txt

Note: This won't work if a field contains double quote, newline, etc. Use csv parsers available in perl , python , etc

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM