[英]R- Specific merging of rows in a dataframe within unique groups
我在R中有一个巨大的数据框,如下所示:
df <- data.frame("ITEM" = c(1,1,1,2,2,3,3,3,3,4),
"ID" = c("A","B","C","D","E","F","G","A","B","C"),
"Score" = c(7,8,7,3,5,4,6,9,10,5),
"Date" = = c("1/1/2018","1/3/2018","1/6/2018","1/7/2017","1/10/2017","1/1/2003","1/3/2004","1/5/2008","1/7/2010","1/8/2010"))
ITEM ID Score Date
1 1 A 7 1/1/2018
2 1 B 8 1/3/2018
3 1 C 7 1/6/2018
4 2 D 3 1/7/2017
5 2 E 5 1/10/2017
6 3 F 4 1/1/2003
7 3 G 6 1/3/2004
8 3 A 9 1/5/2008
9 3 B 10 1/7/2010
10 4 C 5 1/8/2010
11 4 H 8 1/3/2011
数据已按唯一项目按升序排列。 我想将数据转换为以下内容:
ITEM ID Score Date ID_2 Score_2 Date_2
1 1 A 7 1/1/2018 B 8 1/3/2018
2 1 B 8 1/3/2018 C 7 1/6/2018
4 2 D 3 1/7/2017 E 5 1/10/2017
6 3 F 4 1/1/2003 G 6 1/3/2004
7 3 G 6 1/3/2004 A 9 1/5/2008
8 3 A 9 1/5/2008 B 10 1/7/2010
10 4 C 5 1/8/2010 H 8 1/3/2011
每个项目都有一个所有者,并转移给另一个人并给予分数。 例如,项目1由获得7分的A持有,然后移动到获得8分的B,然后是获得7分的C。
我想以上面的格式得到它...将每一行与上面的行合并(但在项目组内) - 我尝试使用dcast从我所知道的内容中重塑数据,但你会得到ID_3,ID_4列为对于某些项目,我只想要ID_2,Score_2和Date_2的列。
有任何想法吗? 谢谢。
根据预期的输出,我们可以split
'ITEM',用行lag
的cbind
行,然后将data.frame list
转换为单个data.frame
与rbind
out <- do.call(rbind, lapply(split(df, df$ITEM),
function(x) cbind(x[-nrow(x), ], x[-1, -1])))
row.names(out) <- NULL
out
# ITEM ID Score Date ID Score Date
#1 1 A 7 1/1/2018 B 8 1/3/2018
#2 1 B 8 1/3/2018 C 7 1/6/2018
#3 2 D 3 1/7/2017 E 5 1/10/2017
#4 3 F 4 1/1/2003 G 6 1/3/2004
#5 3 G 6 1/3/2004 A 9 1/5/2008
#6 3 A 9 1/5/2008 B 10 1/7/2010
#7 4 C 5 1/8/2010 H 8 1/3/2011
或者使用tidyverse
library(tidyverse)
df %>%
group_by(ITEM) %>%
nest %>%
mutate(data = map(data, ~ bind_cols(.x[-nrow(.x), ], .x[-1, ]))) %>%
unnest
# A tibble: 7 x 7
# ITEM ID Score Date ID1 Score1 Date1
# <int> <chr> <int> <chr> <chr> <int> <chr>
#1 1 A 7 1/1/2018 B 8 1/3/2018
#2 1 B 8 1/3/2018 C 7 1/6/2018
#3 2 D 3 1/7/2017 E 5 1/10/2017
#4 3 F 4 1/1/2003 G 6 1/3/2004
#5 3 G 6 1/3/2004 A 9 1/5/2008
#6 3 A 9 1/5/2008 B 10 1/7/2010
#7 4 C 5 1/8/2010 H 8 1/3/2011
df <- structure(list(ITEM = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 4L,
4L), ID = c("A", "B", "C", "D", "E", "F", "G", "A", "B", "C",
"H"), Score = c(7L, 8L, 7L, 3L, 5L, 4L, 6L, 9L, 10L, 5L, 8L),
Date = c("1/1/2018", "1/3/2018", "1/6/2018", "1/7/2017",
"1/10/2017", "1/1/2003", "1/3/2004", "1/5/2008", "1/7/2010",
"1/8/2010", "1/3/2011")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.