R长到宽格式的分类变量和日期

Question

I want to make a dataset with long format with a categorial variabel (medication) that has a start and end date to wide format. 我想制作一个长格式的数据集，其中的分类变量（药物）具有宽格式的开始和结束日期。 As the result there should be one line per ID and for each medication a column that has the entry 1/0 either the patient got the medication or not. 结果，每个ID应该有一行，并且对于每种药物，具有条目1/0的列或者患者是否得到了药物。 And the medication column should have the respective start and end date as extra columns. 药物专栏应将相应的开始和结束日期作为额外的列。

I wanted to 我想

test <- data.frame(
  PatID  = c(1L, 1L, 2L, 2L, 3L, 4L,4L),
  medication = c("Jak","Others", "HU", "Inf","Others", "HU","Others"),  startDate   = c("2016-12-14", "2017-02-04", "2016-03-26", "2016-06-13", "2012-27-03", "2012-04-21", "2010-02-03"),
  endDate   = c("2018-11-14", "2018-02-25", "2017-06-13", "2017-11-12", "2018-27-03", "2016-04-30", "2016-08-16")
)

The output should be the following 输出应如下

ID   Jak   Jak_startDate   Jak_endDate   HU   HU_startDate   HU_endDate   Inf   Inf_startDate   Inf_endDate   Others   Others_startDate   Others_endDate <br/>
1    1      2016-12-14    2018-11-14     0     NA                NA        0        NA             NA            1      2017-02-04        2018-02-25
2    0         NA               NA       1   2017-06-13   2017-11-12       1   2018-03-27   2016-04-30           0         NA               NA
3    0         NA               NA       0     NA                NA        0        NA             NA            1      2012-27-03        2018-27-03
4    0         NA               NA       1   2012-04-21   2016-04-30       0        NA             NA            1      2010-02-03        2016-08-16

Answer 1

Using tidyverse here is what I did: 在这里使用tidyverse是我做的：

test %>% 
gather(key, value, - PatID, -medication) %>% 
arrange(PatID, value) %>% 
mutate(new_key = paste(medication, key, sep = "_")) %>% 
select(PatID, new_key, value) %>% 
spread(new_key, value) %>% 
left_join(test %>% select(PatID, medication) %>% 
mutate(ind = 1) %>% 
spread(medication, ind))

Here, I make the data longer, then arrange it according the PatID and value . 在这里，我将数据设置得更长，然后根据PatID和value进行arrange 。 Then I create a new key column new_key and select only three variables: PatID , new_key , and value . 然后我创建一个新的键列new_key并只select三个变量： PatID ， new_key和value 。 Then I turn this all into wide data, but we still need the columns Jak , HU , etc. which seem to be indicator variables. 然后我把这全部变成宽数据，但我们仍然需要列， Jak ， HU等，它们似乎是指标变量。 For this within the left_join I take the test data and spread that as well to get the columns you have requested. 为此，我在left_join获取测试数据并将其spread以获取您请求的列。

R长到宽格式的分类变量和日期

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-05-20 14:15:43

R长到宽格式的分类变量和日期

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-05-20 14:15:43

解决方案1
0 已采纳 2019-05-20 14:15:43