[英]How can I compare multiple rows in R?
I would like to compare multiple values by USER.我想按 USER 比较多个值。
Based on USER "A", If the values (A,B,C,D,and E) are same with USER "B", it should be written as B at the newly created variable EQUAL基于USER“A”,如果值(A,B,C,D,和E)与USER“B”相同,则应在新创建的变量EQUAL处写为B
Here is my data这是我的数据
Desired value期望值
I am very new to R, I tried to look at the compare function but got a little overwhelmed.我是 R 的新手,我试图查看比较 function 但有点不知所措。 Would very much appreciate any help.
非常感谢任何帮助。
Here's an abridged version of the data you provided:这是您提供的数据的简化版本:
library(tidyverse)
df <- data.frame(
id = c(1001, 1002, 1003, 1001, 1002, 1003),
user = c('a', 'a', 'a', 'b', 'b', 'b'),
point_a = c(1, 1, NA, 1, 1, NA),
point_b = c(NA, NA, 2, NA, NA, NA),
point_c = c(3, 2, 3, 3, 2, 3),
point_d = c(2, 1, NA, 2, 1, NA),
point_e = c(4, NA, 1, 4, NA, NA)
)
df
id user point_a point_b point_c point_d point_e
1 1001 a 1 NA 3 2 4
2 1002 a 1 NA 2 1 NA
3 1003 a NA 2 3 NA 1
4 1001 b 1 NA 3 2 4
5 1002 b 1 NA 2 1 NA
6 1003 b NA NA 3 NA NA
If you inner_join
on the columns you want to match, and then filter
for rows where user.x
is greater than user.y
(ie first in alphabetical order, to get rid of duplicates and rows matching to themselves), you should be left with the matches you're looking for:如果您在要匹配的列上进行
inner_join
,然后filter
user.x
大于user.y
的行(即首先按字母顺序排列,以消除重复项和与自身匹配的行),您应该留下您正在寻找的比赛:
df %>%
inner_join(df, by = c('point_a', 'point_b', 'point_c', 'point_d', 'point_e')) %>%
filter(user.x < user.y) %>%
rename(user = user.x,
equal = user.y)
id.x user point_a point_b point_c point_d point_e id.y equal
1 1001 a 1 NA 3 2 4 1001 b
2 1002 a 1 NA 2 1 NA 1002 b
We may split
the data along users, and put the result in mapply
and calculate the rowSums
of TRUE
s after comparison with `==`
.我们可以把数据按users
split
,然后把结果放到mapply
中,和`==`
比较后计算TRUE
的rowSums
。 From the resulting matrix we want to know which.max
which allows us to subset the users (without "A"
).从生成的矩阵中,我们想知道
which.max
允许我们对用户进行子集化(没有"A"
)。 The result just needs to be subsetted by user "A"
.结果只需要由用户
"A"
进行子集化。
transform(dat, EQUAL=
split(dat, dat$user) |>
(\(.) mapply(\(x, y) rowSums(x == y, na.rm=TRUE),
unname(.['A']),
.[c('B', 'C')]))() |>
(\(.) sort(unique(dat$user))[-1][apply(., 1, which.max)])()
) |>
(\(.) .[.$user == 'A', ])()
# id user point_a point_b point_c point_d point_e EQUAL
# 1 1001 A 1 NA 3 2 4 B
# 2 1002 A 1 NA 2 1 NA B
# 3 1003 A NA 2 3 NA 1 C
Note: R version 4.1.2 (2021-11-01)
注:
R version 4.1.2 (2021-11-01)
Data:数据:
dat <- structure(list(id = c(1001L, 1002L, 1003L, 1001L, 1002L, 1003L,
1001L, 1002L, 1003L), user = c("A", "A", "A", "B", "B", "B",
"C", "C", "C"), point_a = c(1, 1, NA, 1, 1, NA, 4, 1, NA), point_b = c(NA,
NA, 2, NA, NA, NA, 3, NA, 2), point_c = c(3, 2, 3, 3, 2, 3, 3,
2, 3), point_d = c(2, 1, NA, 2, 1, NA, 2, 1, NA), point_e = c(4,
NA, 1, 4, NA, NA, 4, NA, 1)), class = "data.frame", row.names = c(NA,
-9L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.