[英]How best to use R to reshape dataframe from long to wide and combine values
I have a dataframe of about 2000 rows and 3 columns.我有一个大约 2000 行和 3 列的数据框。 In essence, I want to reshape this dataframe to be wider than longer.本质上,我想将此数据框重塑为更宽而不是更长。 This is an example of my current data:这是我当前数据的示例:
ID ID | Procedure程序 | Date日期 |
---|---|---|
D55 D55 | Sedation镇静剂 | 01/01/2001 01/01/2001 |
D55 D55 | Excision切除 | 01/01/2001 01/01/2001 |
D55 D55 | Biopsy活检 | 01/01/2001 01/01/2001 |
A66 A66 | Sedation镇静剂 | 02/02/2001 02/02/2001 |
A66 A66 | Excision切除 | 02/02/2001 02/02/2001 |
T44 T44 | Sedation镇静剂 | 03/03/2001 03/03/2001 |
T44 T44 | Biopsy活检 | 03/03/2001 03/03/2001 |
T44 T44 | Sedation镇静剂 | 04/04/2001 04/04/2001 |
T44 T44 | Excision切除 | 04/04/2001 04/04/2001 |
G88 G88 | Sedation镇静剂 | 05/05/2001 05/05/2001 |
G88 G88 | Biopsy活检 | 05/05/2001 05/05/2001 |
G88 G88 | Sedation镇静剂 | 06/06/2001 06/06/2001 |
G88 G88 | Excision切除 | 06/06/2001 06/06/2001 |
G88 G88 | Sedation镇静剂 | 07/07/2001 07/07/2001 |
G88 G88 | Re-excision再切除 | 07/07/2001 07/07/2001 |
I want the each row to be one line for the ID, so I'd want to create something like this:我希望每一行都是 ID 的一行,所以我想创建这样的东西:
ID ID | Date 1日期 1 | Procedure(s)程序 | Date 2日期 2 | Procedure(s)程序 | Date 3日期 3 | Procedure(s)程序 |
---|---|---|---|---|---|---|
D55 D55 | 01/01/2001 01/01/2001 | Sedation, Excision, Biopsy镇静、切除、活检 | ||||
A66 A66 | 02/02/2001 02/02/2001 | Sedation, Excision镇静、切除 | ||||
T44 T44 | 03/03/2001 03/03/2001 | Sedation, Biopsy镇静、活检 | 04/04/2001 04/04/2001 | Sedation, Excision镇静、切除 | ||
G88 G88 | 05/05/2001 05/05/2001 | Sedation, Biopsy镇静、活检 | 06/06/2001 06/06/2001 | Sedation, Excision镇静、切除 | 07/07/2001 07/07/2001 | Sedation, Re-excision镇静、再切除 |
The majority of IDs all have the same date, but different procedures documented.大多数 ID 都具有相同的日期,但记录的程序不同。 There are a handful that came in for further procedures on subsequent dates.有少数人在随后的日期进行了进一步的程序。 I can't see any that came in for more than 3 different dates, but a way to count the dates documented per ID would be useful.我看不到超过 3 个不同日期的任何日期,但是计算每个 ID 记录的日期的方法会很有用。
I've tried using cast and dcast so far, but I'm not really getting anywhere.到目前为止,我已经尝试使用 cast 和 dcast,但我并没有真正取得任何进展。 I'm very new to R, so any help would be greatly appreciated!我对 R 很陌生,所以任何帮助将不胜感激! Thanks for reading.谢谢阅读。
library(tidyverse)
df %>%
group_by(ID, Date) %>%
summarize(Procedure = paste0(Procedure, collapse = ", ")) %>%
mutate(col = row_number()) %>%
ungroup() %>%
pivot_wider(names_from = col, values_from = c(Date, Procedure))
This currently requires some reordering afterwards, which could be done like in this answer: https://stackoverflow.com/a/60400134/6851825这目前需要一些重新排序之后,可以像在这个答案中那样完成: https : //stackoverflow.com/a/60400134/6851825
# A tibble: 4 x 7
ID Date_1 Date_2 Date_3 Procedure_1 Procedure_2 Procedure_3
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 A66 2/2/01 NA NA Sedation, Excision NA NA
2 D55 1/1/01 NA NA Sedation, Excision, Biopsy NA NA
3 G88 5/5/01 6/6/01 7/7/01 Sedation, Biopsy Sedation, Excision Sedation, Re-excision
4 T44 3/3/01 4/4/01 NA Sedation, Biopsy Sedation, Excision NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.