简体   繁体   中英

How to remove extra double quotes rather than open and closing double quotes in a line of text using bash script

I have a text file, I want to copy it into CSV file and after that CSV file copy to PostgreSQL table.

My input text file is(old_sample.txt) ,

SVCOP,"12980","2019"0627","1DEX","LUBE, OIL & FILTER - DEXOS "1"","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"

I have to use the below code

cat old_sample.txt
printf "\n"
echo "____________________________________"
printf "\n"
cat old_sample.txt | sed ': again
s/\("[^",]*\)"\([^",]*"\)/\1\2/g
t again
s/""/"/g' 

Output is

SVCOP,"12980","2019"0627","1DEX","LUBE, OIL & FILTER - DEXOS "1"","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"
SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS "1","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00",","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"

The problem is "LUBE, OIL & FILTER - DEXOS "1""

"1" this double quotes not removed due to comma is present inside the double quotes but "2019"0627" this works fine so I want to remove all double quotes inside string enclosed in open and closed double-quotes.otherwise it will show a database error.

This is my code

nl -ba -nln -s, < old_sample.txt | sed ': again
                                      s/\("[^",]*\)"\([^",]*"\)/\1\2/g
                                      t again' | grep 'SVCPTS' > old_sample.csv
psql_local <<SQL || die "Failed to import parts data"
        \copy sample_table from 'old_sample.csv' with (format csv, header false)
SQL 

My target output is

SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS 1","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"

Personally, if I were doing this, I would reach for a utility program. I think you may be able to achieve it by finding the right RegEx - but it might end up being quite complex.

Using something like csvkit - specifically, the csvformat command seems a lot easier. It would also be more reliable if you need to re-use this script with other data in the future (which could have newlines in some fields, or other situations you may need to account for).

Would you please try the following:

while IFS= read -r str; do          # assign a variable "str" to a line
    while true; do                  # infinite loop
        str2=$(sed 's/\([^,]\)"\([^,]\)/\1\2/g' <<< "$str")
        [[ "$str2" = "$str" ]] && break
                                    # if there is no change, exit the loop
        str="$str2"                 # update "str" for next iteration
    done
    echo "$str"
done < "old_sample.txt"

Output:

SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS 1","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"
  • The regex \\([^,]\\)"\\([^,]\\) matches a double quote which is surrounded by non -comma characters.
  • It loops until all extra doouble quotes are removed.
  • The script above will work for the provided example but may not be robust enough for arbitrary inputs. It is recommended to introduce a tool which is able to parse csv files for reliable results as chrisputnam9 suggests.

[EDIT] If your file has CR+LF line endings, please try instead:

while IFS= read -r str; do      # assign a variable "str" to a line
    while true; do              # infinite loop
        str2=$(sed 's/\([^,]\)"\([^,]\)/\1\2/g' <<< "$str")
        [[ "$str2" = "$str" ]] && break
                                # if there is no change, exit the loop
        str="$str2"             # update "str" for next iteration
    done
#   echo "$str"                 # add LF at the end of the output line
    echo -ne "$str\r\n"         # add CR+LF at the end of the output line
done < <(tr -d "\r" < "VehSer_NEWM11_test.txt")
                                # remove CR code

BTW if perl is your option, following code will work much faster:

perl -pe '1 while s/([^,])"([^,\r])/$1$2/g' VehSer_NEWM11_test.txt

Can't do it in one command so i've made this

 $ sed "s/['\"]//g; s/,/\",\"/g; s/\",\" /, /g; s/,,/,\"\",/g; s/$/\"/; s/\"//" file
SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS 1","I,0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS 1","91","LANE","LANE","L,LA MERE","125.00","125.00,"",0.00","0.00","0,0","0,||||||||||||||||||||||||","N"

Or this if you need ''1''

$ sed 's/"//g; s/,/","/g; s/"," /, /g; s/,,/,"",/g; s/$/"/; s/"//' file
SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS 1","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM