查找以從一個數據集匹配到另一個數據集

Question

我需要幫助將數據集 2 的值乘以數據集 1。

數據#1：

政策＃	風險＃	優質的
KOK1	002	150
KOK2	003	130

數據#2：

來源	政策＃	風險＃	項目1
建造年齡	KOK1	002	3個
建造年份	KOK1	002	5個
折扣1	KOK1	002	10%
折扣2	KOK1	002	5%
建造年齡	KOK2	003	4個
建造年份	KOK2	003	6個
折扣1	KOK2	003	15%
折扣2	KOK2	003	5%

如何設置一個公式來匹配數據#1 和數據#2 中的政策# 和風險#，如果數據#1 中的“溢價”與數據#2 中的折扣 1 和折扣 2 相匹配（如果它們匹配政策#和風險#）？

Answer 1

有兩種方法可以做到這一點。 第一種方法是簡單合並，您可以按策略和風險合並數據集#，然后執行計算。 例如：

data want;
    merge data2(in=d2)
          data1(in=d1);
    by policy risk_nbr;

    /* If policy and risk_nbr match from data2 and data1, then calculate
       a premium */
    if(d2 AND d1 AND find(source, 'Discount') ) then value = Premium*Item1;
   
run;

這類似於 SQL 中的policy, risk_nbr的完全連接，但僅在兩個鍵值匹配時才相乘。 請注意，兩個數據集都必須按policy和risk_nbr排序才能工作。

第二種方法是通過 hash 表查找，這是我最喜歡的執行這些小型查找表的方法之一。 他們真的很快。

將 hash 表視為在 memory 中浮動的獨立表。我們將使用特殊方法與它對話，通過我們數據集中的鍵在 hash 表中查找值並將該值拉下來，以便我們可以使用它。 這就是它的樣子。

data want;

    /* Only load the hash table once */
    if(_N_ = 1) then do;
        dcl hash h(dataset: 'data2');      *Add a dataset to a hash table called 'h';
            h.defineKey('policy', 'risk'); *Define our lookup key;
            h.defineData('premium');       *The value we want to pull;
        h.defineDone();                    *Load the dataset into `h`;
      
        /* Initialize the numeric variable 'premium' with a missing value 
           since it does not exist yet. This prevents data step warnings. */
        call missing(premium); 
    end;
    
    /* Look up the value of policy and risk in the set dataset and compare it 
       with the hash table's value of policy and risk.
       If there is a match, rc = 0
    */
    rc = h.Find();

    if(rc = 0 AND find(source, 'Discount') ) then value = Premium*Item1;

    drop rc;
run;

Hash 表非常強大且速度非常快，尤其是在將小表與大表連接時。 您也不需要進行任何預排序。

如果您想了解有關 hash 表的更多信息，請查看我使用 hash 表將處理時間縮短了 90% 的論文 - 您也可以！

查找以從一個數據集匹配到另一個數據集

問題描述

1 個解決方案

解決方案1
1 2022-11-14 18:23:44

查找以從一個數據集匹配到另一個數據集

問題描述

1 個解決方案

解決方案1 1 2022-11-14 18:23:44

解決方案1
1 2022-11-14 18:23:44