简体   繁体   English

使用Sed Mac终端查找和替换空格

[英]Find and Replace with Spaces using Sed Mac Terminal

I have a .CSV file with over 500,000 lines that I need to: 我有一个超过500,000行的.CSV文件,我需要:

  1. find all 'space double quote space' sequences and replace with nothing 找到所有'空间双引号空间'序列并替换为空
  2. find all 'space double quote' sequences and replace with nothing 找到所有'空间双引号'序列并替换为空
  3. find all double quotes and replace with nothing 找到所有双引号并替换为空

Example of .CSV line: .CSV行示例:

"DISH Hartford & New Haven  (Hartford)", "206", "FBNHD", " 06028", " East Windsor Hill", "CT", "Hartford County"

** Required output** **所需输出**

DISH Hartford & New Haven  (Hartford),206,FBNHD,06028,East Windsor Hill,CT,Hartford County

I need to remove all double quotes ( " ) and spaces in front of and behind the commas ( , ). 我需要删除逗号( , )前后的所有双引号( " )和空格。

I've tried 我试过了

$ cd /Users/Leonna/Downloads/
$ cat bs-B2Bformat.csv | sed s/ " //g

This gives me the 'command incomplete' greater than prompt, so I then tried: 这给了我'命令不完整'大于提示,所以我尝试了:

$ cat bs-B2Bformat.csv | sed s/ " //g
sed: 1: "s/": unterminated substitute pattern
$ cat bs-B2Bformat.csv |sed s/ \" //g
sed: 1: "s/": unterminated substitute pattern
$

There are too many lines for me to edit in Excel (Excel won't load all the lines) or even a text editor. 我可以在Excel中编辑太多行(Excel不会加载所有行)甚至是文本编辑器。 How can I fix this? 我怎样才能解决这个问题?

Quoted from here : 引自这里

For POSIX compliance, use the character class [[:space:]] instead of \\s, since the latter is a GNU sed extension. 对于POSIX兼容性,使用字符类[[:space:]]而不是\\ s,因为后者是GNU sed扩展。

Based on that, I would suggest the following, which, as Jonathan Leffler pointed out, is portable across GNU and BSD implementations. 基于此,我建议如下,正如Jonathan Leffler指出的那样,它可以在GNU和BSD实现中移植。

sed -E 's/[[:space:]]?"[[:space:]]?//g' <path/to/file>

The -E flag enables extended regular expressions on BSD implementations. -E标志在BSD实现上启用扩展正则表达式 On GNU sed it is undocumented, but as discussed here , it enables compatibility with the BSD standard. 上GNU sed是未记录的,但如所讨论的在这里 ,它使与BSD标准的兼容性。

Quoted from the manual for BSD sed : 引自BSD sed手册

-E Interpret regular expressions as extended (modern) regular expressions rather than basic regular expressions (BRE's). -E将正则表达式解释为扩展(现代)正则表达式而不是基本正则表达式(BRE)。

Applying the above command on a file containing the following single line 将上述命令应用于包含以下单行的文件

"DISH Hartford & New Haven (Hartford)", "206", "FBNHD", " 06028", " East Windsor Hill", "CT", "Hartford County" “DISH哈特福德和纽黑文(哈特福德)”,“206”,“FBNHD”,“06028”,“东温莎山”,“CT”,“哈特福德郡”

it yields 它产生了

DISH Hartford & New Haven (Hartford),206,FBNHD,06028,East Windsor Hill,CT,Hartford County DISH Hartford&New Haven(Hartford),206,FBNHD,06028,East Windsor Hill,CT,Hartford County

这应该这样做:

sed -i 's/\(\s\|\)"\(\|\s\)//g' bs-B2Bformat.csv

This works for me. 这适合我。 Is this what you want ? 这是你想要的吗 ?

 sed -e 's|", "|,|g' -e 's|^"||g' -e 's|"$||g' file.csv

 echo '"DISH Hartford & New Haven (Hartford)", "206", "FBNHD", " 06028", " East Windsor Hill", "CT", "Hartford County"' | sed -e 's|", "|,|g' -e 's|^"||g' -e 's|"$||g'

 DISH Hartford & New Haven (Hartford),206,FBNHD, 06028, East Windsor Hill,CT,Hartford County

One way is to use and its csv module: 一种方法是使用及其csv模块:

import csv 
import sys 

## Open file provided as argument.
with open(sys.argv[1], 'r') as f:

    ## Create the csv reader and writer. Avoid to quote fields in output.
    reader = csv.reader(f, skipinitialspace=True)
    writer = csv.writer(sys.stdout, quoting=csv.QUOTE_NONE, escapechar='\\')

    ## Read file line by line, remove leading and trailing white spaces and
    ## print.
    for row in reader:
        row = [field.strip() for field in row]
        writer.writerow(row)

Run it like: 运行它像:

python3 script.py csvfile

That yields: 产量:

DISH Hartford & New Haven  (Hartford),206,FBNHD,06028,East Windsor Hill,CT,Hartford County

What all of the current answers seemed to miss: 所有目前的答案似乎都错过了:

 $ cat bs-B2Bformat.csv | sed s/ " //g sed: 1: "s/": unterminated substitute pattern $ cat bs-B2Bformat.csv |sed s/ \\" //g sed: 1: "s/": unterminated substitute pattern $ 

The problem in the above is missing single quotes. 上面的问题是缺少单引号。 It should have been: 应该是:

$ cat bs-B2Bformat.csv | sed 's/ " //g'
                             ^        ^

Without the single quotes, bash splits at the spaces and sends three separate arguments (well at least for the case of \\" ). sed was seeing its first argument as just s/ . 如果没有单引号,bash会在空格处拆分并发送三个单独的参数(至少对于\\"的情况而言).sed看到它的第一个参数只是s/

Edit: FYI, single quotes are not required, they just make this case easier. 编辑:仅供参考,单引号不是必需的,它们只是简化了这种情况。 If you want to use double quotes, just escape the one you want to keep for matching: 如果你想使用双引号,只需要转义你要保留的匹配词:

$ cat bs-B2Bformat.csv | sed "s/ \" //g"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM