如何删除SAS中两个字符串之间的所有出现次数（0或多个）

Question

I am trying to parse a .json file into SAS. 我试图将.json文件解析为SAS。 In order to deal with lists in the .json file, I would like to remove all commas from between [item1, item2, item3, .... itemn], but keep the commas that are not within []. 为了处理.json文件中的列表，我想删除[item1，item2，item3，.... itemn]之间的所有逗号，但保留不在[]内的逗号。

I think I should be able to do this using prxchange regular expression...I can get it working for a two item list, but can't figure out how to alter it to work with lists of different amounts. 我想我应该能够使用prxchange正则表达式来完成这个...我可以让它适用于两个项目列表，但无法弄清楚如何改变它以使用不同数量的列表。

newvariable=prxchange('s/(\\[\\w+),(\\w+\\])/$1 $2',-1,oldvariable);

Examples: 例子：

oldvariable = "{"hospital": "NOP", "drugs": ["penicillin", "ampicillin", "cephalosporin"]}" 
newvariable = "{"hospital": "NOP", "drugs": ["penicillin" "ampicillin" "cephalosporin"]}" 

oldvariable = "{"hospital": "KOP", "drugs": ["tetracycline"]}" 
newvariable = "{"hospital": "KOP", "drugs": ["tetracycline"]}"

Maybe there is a better way to approach this... 也许有更好的方法来解决这个问题......

Answer 1

Sometimes the easiest way to handle a regex is to break it into steps. 有时处理正则表达式的最简单方法是将其分解为步骤。 In this case, first get the array out, then replace the commas with spaces: 在这种情况下，首先获取数组，然后用空格替换逗号：

data _null_;
oldvariable = '{"hospital": "NOP", "drugs": ["penicillin", "ampicillin", "cephalosporin"]}';
arrayExpr=prxparse( '/\[[^]]+\]/' );
call prxsubstr( arrayExpr, oldvariable, position, length );
put position length;
newvariable=cat(
    substr( oldvariable, 1, position - 1 ),
    prxchange( 's/, / /', -1, substr( oldvariable, position, length ) ),
    substr( oldvariable, position + length )
);
put newvariable;
run;

Your original regex had some problems as well. 你的原始正则表达式也存在一些问题。 Of the many regex-helper sites this one is my favorite . 在许多正则表达式帮手网站中，这个是我的最爱。

Answer 2

You can take advantage of SAS's DSD option (which allows you to use quotation marks to ignore embedded delimiters) if you do a smaller prxchange, similar to Leo's suggestion. 如果你做一个较小的prxchange，你可以利用SAS的DSD选项（允许你使用引号来忽略嵌入的分隔符），类似于Leo的建议。

data have;
infile datalines dlm=',' dsd;
input @;
_prx = prxparse('s~\[([0-9,]*?)\]~"$1"~io');
_prxm = prxmatch(_prx,_infile_);
if _prxm then call prxchange(_prx,-1,_infile_);
_test_=_infile_;
input a b $ c d $;
datalines;
1,Hello,2,3
2,Goodbye,3,[4,5,6]
;;;;
run;

In your case I'm not sure if double quotation marks would work, since they have a meaning in JSON, but you could use single quotes just as well. 在你的情况下，我不确定双引号是否可行，因为它们在JSON中有意义，但你也可以使用单引号。

如何删除SAS中两个字符串之间的所有出现次数（0或多个）

问题描述

2 个解决方案

解决方案1
4 2013-07-11 04:54:12

解决方案2
1 2013-07-11 13:42:33

如何删除SAS中两个字符串之间的所有出现次数（0或多个）

问题描述

2 个解决方案

解决方案1 4 2013-07-11 04:54:12

解决方案2 1 2013-07-11 13:42:33

解决方案1
4 2013-07-11 04:54:12

解决方案2
1 2013-07-11 13:42:33