简体   繁体   English

等效于SAS的SQL验证字符串数据

[英]SAS equivalent for SQL validation of string data

I'm trying to validate some alphanumeric data of length 6 and I have a working piece of code in SQL to do this, but I'm struggling with how to code this in SAS as a calculated column in my query. 我正在尝试验证一些长度为6的字母数字数据,并且在SQL中有一段有效的代码可以做到这一点,但是我在如何在SAS中将其编码为查询中的计算列而感到困惑。

In SQL a valid string in my data meets the following criteria: 在SQL中,我数据中的有效字符串符合以下条件:

CASE 
   WHEN <String> LIKE '[a-z][0-9][a-z][0-9][a-z][0-9]' 
      THEN 'Valid'
      ELSE 'Invalid' 
END

What functions can I use in SAS that will achieve this? 我可以在SAS中使用哪些功能来实现此目的? I'm using SAS EG as my tool. 我正在使用SAS EG作为我的工具。

Thanks! 谢谢!

Assuming you are asking about writing this as SAS code, this can be done any number of ways. 假设您正在询问将其编写为SAS代码,则可以采用多种方法来完成此操作。 The most similar that you can do is to use Perl regular expressions; 您可以做的最相似的事情是使用Perl正则表达式。 I don't think LIKE in SAS supports regex syntax (despite [ being a special character), or at least the documentation doesn't mention it as possible and I couldn't get it to work. 我不认为SAS中的LIKE不支持正则表达式语法(尽管[是特殊字符),或者至少文档没有提及它,而且我无法使其正常工作。

data have;
length charvar $6;
  input charvar $;
  datalines;
a1b2c3
1A2B3C
AAAAAA
111111
C3B2A1
;;;;
run;

proc sql;
select charvar, 
CASE 
   WHEN prxmatch('/[a-z][0-9][a-z][0-9][a-z][0-9]/i',charvar)
      THEN 'Valid'
      ELSE 'Invalid' 
END
from have;
quit;

You could do the same thing in a SAS datastep, or a number of other things that would work just as well. 您可以在SAS数据步骤中执行相同的操作,或者执行其他同样有效的操作。

You could also consider using FIND( string to be searched , string you're looking for ). 您也可以考虑使用FIND( 要搜索的 字符串,您要查找的字符串 )。

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002267763.htm http://support.sas.com/documentation/cdl/zh-CN/lrdict/64316/HTML/default/viewer.htm#a002267763.htm

The FIND function searches string for the first occurrence of the specified substring, and returns the position of that substring. FIND函数在字符串中搜索指定子字符串的首次出现,然后返回该子字符串的位置。 If the substring is not found in string, FIND returns a value of 0. 如果在字符串中找不到子字符串,则FIND返回值0。

So long as your FIND() returns a number greater than 0, you'll know that you have a match in there. 只要您的FIND()返回的数字大于0,您就会知道那里有一个匹配项。

I think Joe's answer using Perl regular expressions with prxmatch() is the best approach to this problem. 我认为Joe的使用带有prxmatch()的Perl正则表达式的答案是解决此问题的最佳方法。 However, to demonstrate SAS macros and string functions, here's an alternative. 但是,为了演示SAS宏和字符串函数,这里是一种替代方法。

In this approach, each character is checked in turn. 用这种方法,依次检查每个字符。 substr(,&pos,1) isolates the character, and compress(,,'xk') deletes the character if it is not of the correct type (specified by whether 'x' is 'a' or 'd'). substr(,&pos,1)隔离字符,如果字符类型不正确(由'x'是'a'或'd'指定),则compress(,,'xk')删除该字符。 Applying the lengthn() function returns 1 if the character is of the correct type and 0 otherwise. 如果字符的类型正确,则应用lengthn()函数将返回1,否则返回0。 Note that length() will not work, because it would return 1 for an empty string. 请注意,length()将不起作用,因为它将为空字符串返回1。 Then, 'Valid' is assigned if all characters are of the correct type. 然后,如果所有字符均为正确类型,则分配“有效”。

data have;
    length charvar $6;
    input charvar $;
    datalines;
a1b2c3
1A2B3C
AAAAAA
111111
C3B2A1
;
run;

* invar is the variable, pos is the position of the letter being checked, type is be d for digits or a for letters;
%macro check(invar, pos, type) ;
    (1 = lengthn(compress(substr(&invar, &pos, 1), , "&type.k")))
%mend  ;

data validation ;
    set have ;
    length validation $7 ;
    if %check(charvar, 1, a) & %check(charvar, 2, d) & %check(charvar, 3, a) & 
       %check(charvar, 4, d) & %check(charvar, 5, a) & %check(charvar, 6, d)
        then validation = 'Valid' ;         
        else validation = 'Invalid' ;       
run ;
dm 'vt validation';

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM