繁体   English   中英

如何使用bash脚本在一行文本中删除额外的双引号而不是打开和关闭双引号

[英]How to remove extra double quotes rather than open and closing double quotes in a line of text using bash script

我有一个文本文件,我想将它复制到 CSV 文件中,然后将该 CSV 文件复制到 PostgreSQL 表中。

我的输入文本文件是(old_sample.txt) ,

SVCOP,"12980","2019"0627","1DEX","LUBE, OIL & FILTER - DEXOS "1"","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"

我必须使用下面的代码

cat old_sample.txt
printf "\n"
echo "____________________________________"
printf "\n"
cat old_sample.txt | sed ': again
s/\("[^",]*\)"\([^",]*"\)/\1\2/g
t again
s/""/"/g' 

输出是

SVCOP,"12980","2019"0627","1DEX","LUBE, OIL & FILTER - DEXOS "1"","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"
SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS "1","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00",","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"

问题是"LUBE, OIL & FILTER - DEXOS "1""

“1”由于逗号而未删除的双引号存在于双引号内,但“2019”0627”这工作正常,因此我想删除包含在打开和关闭双引号中的字符串内的所有双引号。否则它将显示一个数据库错误。

这是我的代码

nl -ba -nln -s, < old_sample.txt | sed ': again
                                      s/\("[^",]*\)"\([^",]*"\)/\1\2/g
                                      t again' | grep 'SVCPTS' > old_sample.csv
psql_local <<SQL || die "Failed to import parts data"
        \copy sample_table from 'old_sample.csv' with (format csv, header false)
SQL 

我的目标输出是

SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS 1","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"

就我个人而言,如果我这样做,我会找到一个实用程序。 我认为您可以通过找到正确的 RegEx 来实现它 - 但它最终可能会非常复杂。

使用诸如csvkit 之类的东西- 具体来说, csvformat 命令似乎要容易 得多 如果您将来需要将此脚本与其他数据一起使用(可能在某些字段中包含换行符,或者您可能需要考虑的其他情况),它也会更可靠。

请您尝试以下操作:

while IFS= read -r str; do          # assign a variable "str" to a line
    while true; do                  # infinite loop
        str2=$(sed 's/\([^,]\)"\([^,]\)/\1\2/g' <<< "$str")
        [[ "$str2" = "$str" ]] && break
                                    # if there is no change, exit the loop
        str="$str2"                 # update "str" for next iteration
    done
    echo "$str"
done < "old_sample.txt"

输出:

SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS 1","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"
  • 正则表达式\\([^,]\\)"\\([^,]\\)匹配由逗号字符包围的双引号。
  • 它循环直到所有额外的双引号都被删除。
  • 上面的脚本适用于提供的示例,但对于任意输入可能不够健壮。 建议引入一个能够解析 csv 文件以获得可靠结果的工具,正如 chrisputnam9 所建议的那样。

[编辑]如果您的文件有 CR+LF 行结尾,请尝试:

while IFS= read -r str; do      # assign a variable "str" to a line
    while true; do              # infinite loop
        str2=$(sed 's/\([^,]\)"\([^,]\)/\1\2/g' <<< "$str")
        [[ "$str2" = "$str" ]] && break
                                # if there is no change, exit the loop
        str="$str2"             # update "str" for next iteration
    done
#   echo "$str"                 # add LF at the end of the output line
    echo -ne "$str\r\n"         # add CR+LF at the end of the output line
done < <(tr -d "\r" < "VehSer_NEWM11_test.txt")
                                # remove CR code

顺便说一句,如果perl是您的选择,下面的代码会工作得更快:

perl -pe '1 while s/([^,])"([^,\r])/$1$2/g' VehSer_NEWM11_test.txt

不能在一个命令中完成,所以我做了这个

 $ sed "s/['\"]//g; s/,/\",\"/g; s/\",\" /, /g; s/,,/,\"\",/g; s/$/\"/; s/\"//" file
SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS 1","I,0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS 1","91","LANE","LANE","L,LA MERE","125.00","125.00,"",0.00","0.00","0,0","0,||||||||||||||||||||||||","N"

或者如果你需要''1''

$ sed 's/"//g; s/,/","/g; s/"," /, /g; s/,,/,"",/g; s/$/"/; s/"//' file
SVCOP,"12980","20190627","1DEX","LUBE, OIL & FILTER - DEXOS 1","I","0.4","0.4","15.95","10.80","0.00","0.00","0.00","0.00","0.00","0.00","38.03","30.17","53.98","40.97","FULL SYNTHETIC MOTOR OIL.","LUBE, OIL & FILTER - DEXOS ''1''","91","LANE","LANE","L","LA MERE","125.00","125.00","","0.00","0.00","0","0","0","||||||||||||||||||||||||","N"

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM