SAS：如何在字符串中找到一個字符/一組字符的第n個實例？

Question

我試圖找到一個將索引字符的第n個實例的函數。

例如，如果我有字符串ABABABBABSSSDDEE並且我想找到A的第三個實例， A怎么做？ 如果我想找到AB的第4個實例怎么辦

ABAB A BB AB SSSDDEE

data HAVE;
   input STRING $;
   datalines;
ABABABBASSSDDEE
;
RUN;

Answer 1

data _null_;
findThis = 'A'; *** substring to find;
findIn = 'ADABAACABAAE'; **** the string to search;
instanceOf=1; *** and the instance of the substring we want to find;
pos = 0; 
len = 0; 
startHere = 1; 
endAt = length(findIn);
n = 0; *** count occurrences of the pattern;
pattern =  '/' || findThis || '/'; 
rx = prxparse(pattern);
CALL PRXNEXT(rx, startHere, endAt, findIn, pos, len);
if pos le 0 then do;
    put 'Could not find ' findThis ' in ' findIn;
end;
else do while (pos gt 0);
    n+1;
    if n eq instanceOf then leave;
    CALL PRXNEXT(rx, startHere, endAt, findIn, pos, len);
end;
if n eq instanceOf then do;
    put 'found ' instanceOf 'th instance of ' findThis ' at position ' pos ' in ' findIn;
end;
else do;
    put 'No ' instanceOf 'th instance of ' findThis ' found';
end;
run;

Answer 2

這是在數據步驟中使用find()函數和do循環的解決方案。 然后，我獲取該代碼，並將其放入proc fcmp過程中，以創建自己的名為find_n()的函數。 這將大大簡化正在使用此任務的任何任務，並允許代碼重用。

定義數據：

data have;  
  length string $50;
  input string $;
  datalines;
ABABABBABSSSDDEE
;
run;

循環解決方案：

data want;
  set have;  
  search_term = 'AB';
  nth_time = 4;
  counter = 0;
  last_find = 0;

  start = 1;
  pos = find(string,search_term,'',start);
  do while (pos gt 0 and nth_time gt counter);
    last_find = pos;
    start = pos + 1;
    counter = counter + 1;
    pos = find(string,search_term,'',start+1);
  end;

  if nth_time eq counter then do;    
    put "The nth occurrence was found at position " last_find;
  end;
  else do;
    put "Could not find the nth occurrence";
  end;

run;

定義proc fcmp函數：

注意：如果找不到n次出現，則返回0。

options cmplib=work.temp.temp;

proc fcmp outlib=work.temp.temp;

  function find_n(string $, search_term $, nth_time) ;    

    counter = 0;
    last_find = 0;

    start = 1;
    pos = find(string,search_term,'',start);
    do while (pos gt 0 and nth_time gt counter);
      last_find = pos;
      start = pos + 1;
      counter = counter + 1;
      pos = find(string,search_term,'',start+1);
    end;

    result = ifn(nth_time eq counter, last_find, 0);

    return (result);
  endsub;

run;

proc fcmp用法示例：

請注意，這兩次調用該函數。 第一個示例顯示了原始請求解決方案。 第二個示例顯示了找不到匹配項時發生的情況。

data want;
  set have;  
  nth_position = find_n(string, "AB", 4);
  put nth_position =;

  nth_position = find_n(string, "AB", 5);
  put nth_position =;
run;

Answer 3

我知道我在這里參加晚會很晚，但是為了增加答案的范圍，這就是我的想法。

DATA test;
   input   = "ABABABBABSSSDDEE";

   A_3  = find(prxchange("s/A/#/",   2, input), "A");
   AB_4 = find(prxchange("s/AB/##/", 3, input), "AB");
RUN;

分解一下， prxchange()只是進行模式匹配替換，但是它的prxchange()在於您可以告訴它替換該模式多少次 。 所以， prxchange("s/A/#/", 2, input)取代了前兩個A在input與＃。 替換完前兩個A后，可以將其包裝在find()函數中以查找“第一個A”，它實際上是原始字符串的第三個A。

關於此方法要注意的一件事是，理想情況下，替換字符串的長度應與您要替換的字符串的長度相同。 例如，請注意

prxchange("s/AB/##/", 3, input) /* gives 8 (correct) */

和

prxchange("s/AB/#/", 3, input)  /* gives 5 (incorrect) */

這是因為我們將長度為2的字符串替換為長度為1的字符串三次。 換一種說法：

(length("#") - length("AB")) * 3 = -3

所以8 + (-3) = 5

希望可以幫助某人！

Answer 4

這是使用SAS find（）函數在SAS字符串中查找一組字符的第N個實例的簡化后的實現：

     data a;
        s='AB bhdf +BA s Ab fs ABC Nfm AB ';
        x='AB';
        n=3;

        /* from left to right */
        p = 0;
        do i=1 to n until(p=0); 
           p = find(s, x, p+1);
        end;
        put p=;

        /* from right to left */
        p = length(s) + 1;
        do i=1 to n until(p=0); 
           p = find(s, x, -p+1);
        end;
        put p=;
     run;

如您所見，它允許從左到右和從右到左搜索。

您可以將這兩者組合成SAS用戶定義的函數（負數n表示從右到左進行搜索，就像在find函數中一樣）：

     proc fcmp outlib=sasuser.functions.findnth;
        function findnth(str $, sub $, n);
           p = ifn(n>=0,0,length(str)+1);
           do i=1 to abs(n) until(p=0);
              p = find(str,sub,sign(n)*p+1);
           end;
           return (p);
        endsub;
     run;

請注意，上面使用FIND（）和FINDNTH（）函數的解決方案假定搜索的子字符串可以與其先前的實例重疊。 例如，如果我們在字符串“ ABAAAA”中搜索子字符串“ AAA”，則將在位置3中找到“ AAA”的第一個實例，並在位置4中找到第二個實例。即，第一個和第二實例是重疊的。 因此，當我們找到一個實例時，我們會將位置p增加1（p + 1），以開始下一次搜索迭代（實例）。 但是，如果這樣的重疊在您的搜索中不是有效的情況，並且您想在上一個子字符串實例結束后繼續搜索，那么我們應該將p而不是1的長度增加，而是將x的長度增加。 這將加快我們的搜索速度（子字符串x越長），因為我們在遍歷字符串s時將跳過更多字符。 在這種情況下，在我們的搜索代碼中，我們應該將p + 1替換為p + w，其中w = length（x）。

在我最近的SAS博客文章“ 在字符串中查找子字符串的第n個實例”中，對此問題進行了詳細討論。 我還發現使用find（）函數比使用SAS中的正則表達式函數要快得多。

SAS：如何在字符串中找到一個字符/一組字符的第n個實例？

問題描述

4 個解決方案

解決方案1
1 2016-08-05 09:21:42

解決方案2
1 2016-08-05 15:15:53

解決方案3
1 2019-03-22 20:37:14

解決方案4
1 2019-07-16 20:36:52

SAS：如何在字符串中找到一個字符/一組字符的第n個實例？

問題描述

4 個解決方案

解決方案1 1 2016-08-05 09:21:42

解決方案2 1 2016-08-05 15:15:53

解決方案3 1 2019-03-22 20:37:14

解決方案4 1 2019-07-16 20:36:52

解決方案1
1 2016-08-05 09:21:42

解決方案2
1 2016-08-05 15:15:53

解決方案3
1 2019-03-22 20:37:14

解決方案4
1 2019-07-16 20:36:52