簡體   English   中英

將兩個數據集與一個對應觀察值中的變量合並到另一個數據集

[英]Merging two datasets with variables in one corresponding observations in the other

我今天遇到了一個挑戰,希望能得到一些幫助。 我想merge 2 個數據集。 Dataset1 包含具有 3 個變量的成員花名冊,批次中的 batch_id、member_num 和職業。 Dataset2 由成員許可證狀態組成。

這里的挑戰是,在 dataset2 中,member_num 被表示為一個變量,其方式是 dataset2 中的 member_num_x 對應於 dataset1 中變量 member_num 下的“x”觀察。 我需要merge這兩個數據集,以便最后我有一個數據集,其中包含每個成員的 batch_id、member_num、職業和許可證狀態。


Dataset1
| batch_id   | member_num | occupation |
| --------   | --------   |  -------- 
| A01        |   1        | Driver     |
| A01        |   2        | Driver     |
| A01        |   3        | Driver     |
| A01        |   4        | Driver     |
| A02        |   1        | Navigator  |
| A02        |   2        | Navigator  |


Dataset2
| batch_id |member_num_1|member_num_2|member_num_3|member_num_4|
| -------- | --------   | --------   | --------   | --------   |         
| A01      | Yes        |   NA       |   Yes      |   No       | 
| A02      | No    |    |   NA       |


Desired Output 

| batch_id   | member_num | occupation | License_status
| --------   | --------   |  -------- 
| A01        |   1        | Driver     | Yes
| A01        |   2        | Driver     | NA
| A01        |   3        | Driver     | Yes
| A01        |   4        | Driver     | No
| A02        |   1        | Navigator  | No
| A02        |   2        | Navigator  | NA

我試過在 Stata 中使用merge命令,但是沒有選項可以進行這種特殊的合並。 那里的選項使用唯一變量(幾乎與主鍵上的連接相同)。

您需要將 d2 重塑為長格式,然后合並/鏈接 batch_id、member_num

方法 1(使用框架)

clear
use d1
frame create d2
frame d2: use d2
frame d2: reshape long member_num_, i(batch_id) j(member_num)
frlink 1:1 batch_id member_num, frame(d2)
frget License_status = member_num_, from(d2)

方法 2(使用合並)

clear
use d2
reshape long member_num_, i(batch_id) j(member_num)
rename member_num_ License_status
tempfile d2long
save `d2long',replace
use d1,clear
merge 1:1 batch_id member_num using `d2long',nogenerate keep(1 3)

Output:

       batch_id   member~m   occupat~n   d2   Licens~s  
  1.        A01          1      Driver    1        Yes  
  2.        A01          2      Driver    2         NA  
  3.        A01          3      Driver    3        Yes  
  4.        A01          4      Driver    4         No  
  5.        A02          1   Navigator    5         No  
  6.        A02          2   Navigator    6         NA

輸入:

d1.dta:

       batch_id   member~m   occupat~n  
  1.        A01          1      Driver  
  2.        A01          2      Driver  
  3.        A01          3      Driver  
  4.        A01          4      Driver  
  5.        A02          1   Navigator  
  6.        A02          2   Navigator  

d2.dta:

       batch_id   member~1   member~2   member~3   member~4  
  1.        A01        Yes         NA        Yes         No  
  2.        A02         No         NA                       

小編輯的最終答案:

clear 

use "file_Dataset2.dta" reshape long member_num_, i(batch_id) j(member_num) rename member_num_ License_status 
tempfile d2long 
save d2long',replace  

use "file_Dataset1.dta",clear  

merge m:1 batch_id member_num using d2long',nogenerate keep(1 3)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM