简体   繁体   English

如何用方括号替换空格

[英]How to replace spaces with in square brackets

I have been looking for a solution for this for days now, I want to replace all spaces only when they occur inside square brackets, and replace them with a double quote, comma, double quote.几天来我一直在寻找解决方案,我只想替换出现在方括号内的所有空格,并用双引号、逗号、双引号替换它们。 ---> "," ---> ","

The solution can be any where as long as it works, but need to say that its a huge file.该解决方案可以在任何地方,只要它有效,但需要说它是一个巨大的文件。

An example:一个例子:

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.3"],

Will be:将会:

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

Preferable solutions since I am working with these tools: Visual Studio OR Sed.首选解决方案,因为我正在使用这些工具:Visual Studio 或 Sed。

Thank you谢谢

Using sed使用sed

$ sed ':a;s/\(\["[^ ]*\) /\1","/;ta' input_file
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

If the loop does not work for your use case but the pattern will always be the same, you can also try如果循环不适用于您的用例,但模式始终相同,您也可以尝试

$ sed 's/\(\[[^ ]*\) /\1","/g' input_file
"NIST 800-171A": ["3.11.2[a]","3.11.2 3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

To replace any spaces after the first [ and before the last ] ... search for:要替换第一个[之后和最后一个]之前的任何空格...搜索:

(^[^\[]+|\][^\]]+$)|\s+

and replace with $1 (what's captured inside the first capture group ).并替换为$1 (在第一个捕获组中捕获的内容)。 See this demo at regex101 .请参阅regex101 中的此演示

The idea is to use a variation of The Trick .这个想法是使用The Trick的变体。 It captures the first and last part to group 1 OR matches any whitespaces.它捕获第 1的第一个和最后一个部分或匹配任何空格。 Because the spaces are not captured, they get replaced by empty matches.因为没有捕获空格,所以它们被空匹配替换。

Python Solution Python 解决方案

Replace the sample.txt file with your file and it will replace the spaces with in the [] with "," and write it back to the file.sample.txt文件替换为您的文件,它会将[]中的空格替换为","并将其写回文件。

process.py

import re
import os
with open('sample.txt',"r+") as f:
    contents = f.read()
    reg1 = re.compile('(?<=\[\")(.*)(?=\"\])', re.MULTILINE)
    lines = reg1.findall(contents)
    for line in lines:
        newLine = re.sub(r'\s','","',line)
        contents = contents.replace(line,newLine)
    f.seek(0, os.SEEK_SET)
    f.write(contents)

sample.txt

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.3"],
"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.1"],

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.2"],

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.4"],

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.5"],

"NIST 800-171A": ["3.11.2[a] 3.11.2 3.11.2[c] 3.11.2[d] 3.11.2[e] 3.11.3[a] 3.11.6"],

Output

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.1"],

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.2"],

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.4"],

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.5"],

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.6"],

With GNU awk for the 3rd arg to match() and gensub() we can get the output you show from the input you show with:使用 GNU awk 将第三个 arg 用于 match() 和 gensub() 我们可以从您显示的输入中获得您显示的输出:

$ awk 'match($0,/([^[]*)(.*)/,a){ $0=a[1] gensub(/ /,"\",\"","g",a[2]) } 1' file
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

or with any awk:或使用任何 awk:

$ awk 's=index($0,"["){ tgt=substr($0,s); gsub(/ /,"\",\"",tgt); $0=substr($0,1,s-1) tgt } 1' file
"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

no regex capture group or gensub() needed at all :根本不需要正则表达式capture groupgensub()

{m,g}awk '$!NF = substr($!(NF=NF),!_, __=index($_, "[")) substr(_,
            __ = substr($_,++__), gsub(" ","\",\"", __))__' OFS=']\",\"' FS='[]][ \t]+' 

"NIST 800-171A": ["3.11.2[a]","3.11.2","3.11.2[c]","3.11.2[d]","3.11.2[e]","3.11.3[a]","3.11.3"],

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM