[英]R long to wide format categorial variables and dates
I want to make a dataset with long format with a categorial variabel (medication) that has a start and end date to wide format. 我想制作一个长格式的数据集,其中的分类变量(药物)具有宽格式的开始和结束日期。 As the result there should be one line per ID and for each medication a column that has the entry 1/0 either the patient got the medication or not.
结果,每个ID应该有一行,并且对于每种药物,具有条目1/0的列或者患者是否得到了药物。 And the medication column should have the respective start and end date as extra columns.
药物专栏应将相应的开始和结束日期作为额外的列。
I wanted to 我想
test <- data.frame(
PatID = c(1L, 1L, 2L, 2L, 3L, 4L,4L),
medication = c("Jak","Others", "HU", "Inf","Others", "HU","Others"), startDate = c("2016-12-14", "2017-02-04", "2016-03-26", "2016-06-13", "2012-27-03", "2012-04-21", "2010-02-03"),
endDate = c("2018-11-14", "2018-02-25", "2017-06-13", "2017-11-12", "2018-27-03", "2016-04-30", "2016-08-16")
)
The output should be the following 输出应如下
ID Jak Jak_startDate Jak_endDate HU HU_startDate HU_endDate Inf Inf_startDate Inf_endDate Others Others_startDate Others_endDate <br/>
1 1 2016-12-14 2018-11-14 0 NA NA 0 NA NA 1 2017-02-04 2018-02-25
2 0 NA NA 1 2017-06-13 2017-11-12 1 2018-03-27 2016-04-30 0 NA NA
3 0 NA NA 0 NA NA 0 NA NA 1 2012-27-03 2018-27-03
4 0 NA NA 1 2012-04-21 2016-04-30 0 NA NA 1 2010-02-03 2016-08-16
Using tidyverse
here is what I did: 在这里使用
tidyverse
是我做的:
test %>%
gather(key, value, - PatID, -medication) %>%
arrange(PatID, value) %>%
mutate(new_key = paste(medication, key, sep = "_")) %>%
select(PatID, new_key, value) %>%
spread(new_key, value) %>%
left_join(test %>% select(PatID, medication) %>%
mutate(ind = 1) %>%
spread(medication, ind))
Here, I make the data longer, then arrange
it according the PatID
and value
. 在这里,我将数据设置得更长,然后根据
PatID
和value
进行arrange
。 Then I create a new key column new_key
and select
only three variables: PatID
, new_key
, and value
. 然后我创建一个新的键列
new_key
并只select
三个变量: PatID
, new_key
和value
。 Then I turn this all into wide data, but we still need the columns Jak
, HU
, etc. which seem to be indicator variables. 然后我把这全部变成宽数据,但我们仍然需要列,
Jak
, HU
等,它们似乎是指标变量。 For this within the left_join
I take the test data and spread
that as well to get the columns you have requested. 为此,我在
left_join
获取测试数据并将其spread
以获取您请求的列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.