類似GREP的函數，用於檢索SAS中的文本

Question

我想檢索SAS文件中列中的特定文本。

該文件將如下所示：

Patient    Location    infoTxt
001        B           Admission Code: 123456 X
                       Exit Code: 98765W
002        C           Admission Code: 4567 WY
                       Exit Code: 76543Z
003        D           Admission Code: 67890 L
                       Exit Code: 4321Z

我想只檢索排序代碼和退出代碼的冒號之后的信息，並將它們放在各自的列中。 “代碼”可以是字母，數字和空格的任意組合。 新數據如下所示：

Patient    Location    AdmissionCode      ExitCode
001        B           123456 X            8765W
002        C           4567 WY             76543Z
003        D           67890 L             4321Z

我不熟悉SAS中的功能，但邏輯可能如下所示：

data want;
  set have;
  do i = 1 to dim(infoTxt)

    AdmissionCode = substring(string1, regexpr(":", string) + 1);
    ExitCode = substring(string2, regexpr(":", string) + 1);

run;

在上面的代碼中，string1表示infoTxt中的第一行文本，string2表示第二行文本infoTxt。

Answer 1

SAS可以通過以PRX開頭的函數族來利用Perl正則表達式。 如果您熟悉正則表達式，則提示表是一個很好的摘要。

PRXMATCH和PRXPOSN可以使用捕獲組測試正則表達式模式並檢索組文本。

data have;
input;
text = _infile_;
datalines;
Admission Code: 123456 X Exit Code: 98765W
Admission Code: 4567 WY Exit Code: 76543Z
Admission Code: 67890 L Exit Code: 4321Z
run;

data want;
  set have;

  if _n_ = 1 then do;
    retain rx;
    rx = prxparse ('/Admission Code: (.*)Exit Code:(.*)/');
  end;

  length AdmissionCode ExitCode $50;

  if prxmatch(rx,text) then do;
    AdmissionCode = prxposn(rx, 1, text);
    ExitCode = prxposn(rx, 2, text);
  end;

  drop rx;
run;

Answer 2

我喜歡帶有捕獲緩沖區的RegEX和下一個人一樣多，但你也可以使用輸入語句功能來讀取這些數據。

data info;
   infile cards n=2 firstobs=2;
   input #1 patient:$3. location :$1. @'Admission Code: ' AdmissionCode &$16. #2 @'Exit Code: ' ExitCode &$16.;
   cards;
Patient    Location    infoTxt
001        B           Admission Code: 123456 X
                       Exit Code: 98765W
002        C           Admission Code: 4567 WY
                       Exit Code: 76543Z
003        D           Admission Code: 67890 L
                       Exit Code: 4321Z
;;;;
   run;
proc print;
   run;

Answer 3

可能有一個解決方案可以在一個數據步驟中完成所有操作。 這創建了兩個步驟來處理不同行的准入和退出 - 首先是數據步驟，然后是連接以將其重新組合在一起。

SAS確實有正則表達式語法，但我使用的是SAS字符函數。 substr有3個參數，字符串，起始位置和結束位置 - 但是結束位置是可選的，我省略它告訴它在起始位置之后抓住所有內容。 retain用於填充患者和每組第二行中的位置。

data admission exit;
    set grep;
    retain patient2 location2;
    if patient ne '' then do; 
        patient2=patient;
        location2=location;
        admissioncode=substr(infoTxt,find(infoTxt,":")+2);
        output admission;
        end;
    else do;
        exitcode=substr(infoTxt,find(infoTxt,":")+2);
        output exit;
        end;
run;
proc sql;
    create table dat as select a.patient2 as patient,a.location2 as location,a.admissioncode,b.exitcode
        from admission a
        left join exit b on a.patient2=b.patient2 and a.location2=b.location2
    ;
quit;

Answer 4

如果您總是使用相同的冒號和換行符模式，我認為您可以使用scan執行此操作：

  admission_code = scan(infoTxt, 2, '3A0A0D'x);
  exit_code = scan(infoTxt, 4, '3A0A0D'x);

這使用十六進制文字'3A0A0D'x指定: ，換行和回車作為scan功能的分隔符。

類似GREP的函數，用於檢索SAS中的文本

問題描述

4 個解決方案

解決方案1
6 2018-09-06 14:05:44

解決方案2
4 2018-09-06 16:09:23

解決方案3
1 2018-09-06 14:04:14

解決方案4
0 2018-09-06 16:54:38

類似GREP的函數，用於檢索SAS中的文本

問題描述

4 個解決方案

解決方案1 6 2018-09-06 14:05:44

解決方案2 4 2018-09-06 16:09:23

解決方案3 1 2018-09-06 14:04:14

解決方案4 0 2018-09-06 16:54:38

解決方案1
6 2018-09-06 14:05:44

解決方案2
4 2018-09-06 16:09:23

解決方案3
1 2018-09-06 14:04:14

解決方案4
0 2018-09-06 16:54:38