简体   繁体   English

如何最好地使用 R 从长到宽重塑数据帧并组合值

[英]How best to use R to reshape dataframe from long to wide and combine values

I have a dataframe of about 2000 rows and 3 columns.我有一个大约 2000 行和 3 列的数据框。 In essence, I want to reshape this dataframe to be wider than longer.本质上,我想将此数据框重塑为更宽而不是更长。 This is an example of my current data:这是我当前数据的示例:

ID ID Procedure程序 Date日期
D55 D55 Sedation镇静剂 01/01/2001 01/01/2001
D55 D55 Excision切除 01/01/2001 01/01/2001
D55 D55 Biopsy活检 01/01/2001 01/01/2001
A66 A66 Sedation镇静剂 02/02/2001 02/02/2001
A66 A66 Excision切除 02/02/2001 02/02/2001
T44 T44 Sedation镇静剂 03/03/2001 03/03/2001
T44 T44 Biopsy活检 03/03/2001 03/03/2001
T44 T44 Sedation镇静剂 04/04/2001 04/04/2001
T44 T44 Excision切除 04/04/2001 04/04/2001
G88 G88 Sedation镇静剂 05/05/2001 05/05/2001
G88 G88 Biopsy活检 05/05/2001 05/05/2001
G88 G88 Sedation镇静剂 06/06/2001 06/06/2001
G88 G88 Excision切除 06/06/2001 06/06/2001
G88 G88 Sedation镇静剂 07/07/2001 07/07/2001
G88 G88 Re-excision再切除 07/07/2001 07/07/2001

I want the each row to be one line for the ID, so I'd want to create something like this:我希望每一行都是 ID 的一行,所以我想创建这样的东西:

ID ID Date 1日期 1 Procedure(s)程序 Date 2日期 2 Procedure(s)程序 Date 3日期 3 Procedure(s)程序
D55 D55 01/01/2001 01/01/2001 Sedation, Excision, Biopsy镇静、切除、活检
A66 A66 02/02/2001 02/02/2001 Sedation, Excision镇静、切除
T44 T44 03/03/2001 03/03/2001 Sedation, Biopsy镇静、活检 04/04/2001 04/04/2001 Sedation, Excision镇静、切除
G88 G88 05/05/2001 05/05/2001 Sedation, Biopsy镇静、活检 06/06/2001 06/06/2001 Sedation, Excision镇静、切除 07/07/2001 07/07/2001 Sedation, Re-excision镇静、再切除

The majority of IDs all have the same date, but different procedures documented.大多数 ID 都具有相同的日期,但记录的程序不同。 There are a handful that came in for further procedures on subsequent dates.有少数人在随后的日期进行了进一步的程序。 I can't see any that came in for more than 3 different dates, but a way to count the dates documented per ID would be useful.我看不到超过 3 个不同日期的任何日期,但是计算每个 ID 记录的日期的方法会很有用。

I've tried using cast and dcast so far, but I'm not really getting anywhere.到目前为止,我已经尝试使用 cast 和 dcast,但我并没有真正取得任何进展。 I'm very new to R, so any help would be greatly appreciated!我对 R 很陌生,所以任何帮助将不胜感激! Thanks for reading.谢谢阅读。

library(tidyverse)
df %>%
  group_by(ID, Date) %>%
  summarize(Procedure = paste0(Procedure, collapse = ", ")) %>%
  mutate(col = row_number()) %>%
  ungroup() %>%
  pivot_wider(names_from = col, values_from = c(Date, Procedure))

This currently requires some reordering afterwards, which could be done like in this answer: https://stackoverflow.com/a/60400134/6851825这目前需要一些重新排序之后,可以像在这个答案中那样完成: https : //stackoverflow.com/a/60400134/6851825

# A tibble: 4 x 7
  ID    Date_1 Date_2 Date_3 Procedure_1                Procedure_2        Procedure_3          
  <chr> <chr>  <chr>  <chr>  <chr>                      <chr>              <chr>                
1 A66   2/2/01 NA     NA     Sedation, Excision         NA                 NA                   
2 D55   1/1/01 NA     NA     Sedation, Excision, Biopsy NA                 NA                   
3 G88   5/5/01 6/6/01 7/7/01 Sedation, Biopsy           Sedation, Excision Sedation, Re-excision
4 T44   3/3/01 4/4/01 NA     Sedation, Biopsy           Sedation, Excision NA                   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM