简体   繁体   English

sed regex查找和替换(欢迎awk解决方案)

[英]sed regex find & replace (awk solutions welcome)

I'm working on a JSON file (for MongoDB) and need to convert a field name to Database Reference. 我正在处理JSON文件(用于MongoDB),需要将字段名称转换为“数据库引用”。 I'm attempting to do it via sed (though I'm open to solutions using awk, etc), but I'm a complete noob with the tool and am struggling. 我正在尝试通过sed来做到这一点(尽管我愿意使用awk等解决方案),但是我对这个工具完全陌生并且很挣扎。

Input: 输入:

...
"FECTransID" : 4030720141206780377,
"CID" : "N00031103",
"CmteID" : "C00465971",
"RecipCode" : "RW",
"Amount" : 500,
....

Output needed: 需要的输出:

...
"FECTransID" : 4030720141206780377,
"CID" : "N00031103",
"CmteID" : {
    "ref" : "Cmtes",
    "$id" : "C00278101",
    "$db" : "OpenSecrets"
},
"RecipCode" : "RW",
"Amount" : 500,
....

My sed command attempt is: 我的sed命令尝试是:

sed -r 's/\"CmteID\" \: \(\"[\w\d]\{9\}\",\)/\"CmteID\" : { \
                \"ref\" : \"Cmtes\", \
                \"$id\" : \1 \
                \"$db\" : \"OpenSecrets\" \
            }/' <IN_FILE >OUT_FILE

but I get this error when I run it: 但是在运行它时出现此错误:

sed: -e expression #1, char 198: invalid reference \1 on `s' command's RHS

Any help would be appreciated. 任何帮助,将不胜感激。 Thanks. 谢谢。

An awk approach: awk方法:

awk '$1=="\"CmteID\"" {$3="{\n\t\"ref\" : \"Cmtes\",\
                            \n\t\"\$id\" : "$3"\
                            \n\t\"\$db\" : \"OpenSecrets\"\n},"}1' infile

Explanation 说明

When the first field is matched $1=="\\"CmteID\\"" we are changing the third field for the expected string, the only variable part is CmteID value , assigned in: \\n\\t\\"\\$id\\" : "$3" 当第一个字段匹配$1=="\\"CmteID\\""我们将第三个字段更改为期望的字符串,唯一可变的部分是CmteID value,分配给: \\n\\t\\"\\$id\\" : "$3"

Line breaks added (escape char \\ ) to improve the clarity of the code. 添加了换行符(转义char \\ )以提高代码的清晰度。

Results 结果

"FECTransID" : 4030720141206780377,
"CID" : "N00031103",
"CmteID" : {
    "ref" : "Cmtes",                    
    "$id" : "C00465971",                    
    "$db" : "OpenSecrets"
},
"RecipCode" : "RW",
"Amount" : 500,

awk to the rescue! awk解救!

$ awk '$1=="\"CmteID\""{print $1 ": {"; 
                         print "\t\"ref\" : \"Cmtes\","; 
                         print "\t\"$id\" : "$3;
                         print "\t\"$db\" : \"OpenSecrets\",";
                         print "},";
                         next}1' jsonfile

...
"FECTransID" : 4030720141206780377,
"CID" : "N00031103",
"CmteID": {
        "ref" : "Cmtes",
        "$id" : "C00465971",
        "$db" : "OpenSecrets",
},
"RecipCode" : "RW",
"Amount" : 500,
....

with some cleanup 进行一些清理

$ awk -v NT="\n\t" 'function q(x) {return "\""x"\"";}; 
       $1==q("CmteID") {$3 = " {" 
                     NT q("ref") " : " q("Cmtes") "," 
                     NT q("$id") " : " $3 
                     NT q("$db") " : " q("OpenSecrets") 
                     ",\n},"}1' jsonfile
...
"FECTransID" : 4030720141206780377,
"CID" : "N00031103",
"CmteID" :  {
        "ref" : "Cmtes",
        "$id" : "C00465971",
        "$db" : "OpenSecrets",
},
"RecipCode" : "RW",
"Amount" : 500,
....

sed is for simple substitutions on individual lines, that is all. sed用于单行替换,仅此而已。 This problem is not like that, so this is not a job for sed. 这个问题不是那样,所以这不是sed的工作。

$ cat tst.awk
BEGIN { FS=OFS=" : " }
$1 == "\"CmteID\"" {
    print $1, "{"
    print "   \"ref\"", "\"Cmtes\""
    print "   \"$id\"", $2
    print "   \"$db\"", "\"OpenSecrets\""
    $0 = "},"
}
{ print }

$ awk -f tst.awk file
...
TransID" : 4030720141206780377,
"CID" : "N00031103",
"CmteID" : {
   "ref" : "Cmtes"
   "$id" : "C00465971",
   "$db" : "OpenSecrets"
},
"RecipCode" : "RW",
"Amount" : 500,
....

Many languages have built-in JSON parsers. 许多语言都有内置的JSON解析器。 PHP is one of them: PHP是其中之一:

#!/usr/bin/php
<?php
$infile = $argv[1];
$outfile = $argv[2];
$data = json_decode(file_get_contents($infile));
$id = $data["CmteID"];
$data["CmteID"] = array("ref"=>"Cmtes", "\$id"=>$id, "\$db"=>"OpenSecrets");
file_put_contents($outfile, json_encode($data));

Untested but it should work. 未经测试,但应该可以。 Make it executable and call ./myscript.php IN_FILE OUT_FILE . 使它可执行,然后调用./myscript.php IN_FILE OUT_FILE

My main point being, JSON is not text and using text-replacement on it can lead to problems, just like other structured data formats like XML! 我的主要观点是,JSON不是文本,并且像XML这样的其他结构化数据格式一样,对它使用文本替换会导致问题!

This might work for you (GNU sed): 这可能对您有用(GNU sed):

sed -r 's/"CmteID" : (.*)/"CmteID" : { \
            "ref" : "Cmtes", \
            "$id" : \1 \
            "$db" : "OpenSecrets" \
        },/' fileIn >fileOut

This was a case of over quoting. 这是一个过度引用的情况。 The parens grouping the $id had been quoted unneccessarily as the -r was inforce. 由于-r是有效的,因此不必要地引用了将$id分组的括号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM