如何比较 R 中的多行？

Question

I would like to compare multiple values by USER.我想按 USER 比较多个值。

Based on USER "A", If the values (A,B,C,D,and E) are same with USER "B", it should be written as B at the newly created variable EQUAL基于USER“A”，如果值（A，B，C，D，和E）与USER“B”相同，则应在新创建的变量EQUAL处写为B

Here is my data这是我的数据

在此处输入图像描述

Desired value期望值

在此处输入图像描述

I am very new to R, I tried to look at the compare function but got a little overwhelmed.我是 R 的新手，我试图查看比较 function 但有点不知所措。 Would very much appreciate any help.非常感谢任何帮助。

Answer 1

Here's an abridged version of the data you provided:这是您提供的数据的简化版本：

library(tidyverse)

df <- data.frame(
  id = c(1001, 1002, 1003, 1001, 1002, 1003),
  user = c('a', 'a', 'a', 'b', 'b', 'b'),
  point_a = c(1, 1, NA, 1, 1, NA),
  point_b = c(NA, NA, 2, NA, NA, NA),
  point_c = c(3, 2, 3, 3, 2, 3),
  point_d = c(2, 1, NA, 2, 1, NA),
  point_e = c(4, NA, 1, 4, NA, NA)
)

df

    id user point_a point_b point_c point_d point_e
1 1001    a       1      NA       3       2       4
2 1002    a       1      NA       2       1      NA
3 1003    a      NA       2       3      NA       1
4 1001    b       1      NA       3       2       4
5 1002    b       1      NA       2       1      NA
6 1003    b      NA      NA       3      NA      NA

If you inner_join on the columns you want to match, and then filter for rows where user.x is greater than user.y (ie first in alphabetical order, to get rid of duplicates and rows matching to themselves), you should be left with the matches you're looking for:如果您在要匹配的列上进行inner_join ，然后filter user.x大于user.y的行（即首先按字母顺序排列，以消除重复项和与自身匹配的行），您应该留下您正在寻找的比赛：

df %>%
  inner_join(df, by = c('point_a', 'point_b', 'point_c', 'point_d', 'point_e')) %>%
  filter(user.x < user.y) %>%
  rename(user = user.x,
         equal = user.y)

  id.x user point_a point_b point_c point_d point_e id.y equal
1 1001    a       1      NA       3       2       4 1001     b
2 1002    a       1      NA       2       1      NA 1002     b

Answer 2

We may split the data along users, and put the result in mapply and calculate the rowSums of TRUE s after comparison with `==` .我们可以把数据按users split ，然后把结果放到mapply中，和`==`比较后计算TRUE的rowSums 。 From the resulting matrix we want to know which.max which allows us to subset the users (without "A" ).从生成的矩阵中，我们想知道which.max允许我们对用户进行子集化（没有"A" ）。 The result just needs to be subsetted by user "A" .结果只需要由用户"A"进行子集化。

transform(dat, EQUAL=
            split(dat, dat$user) |>
            (\(.) mapply(\(x, y) rowSums(x == y, na.rm=TRUE), 
                         unname(.['A']), 
                         .[c('B', 'C')]))() |>
            (\(.) sort(unique(dat$user))[-1][apply(., 1, which.max)])()
) |>
  (\(.) .[.$user == 'A', ])()
#     id user point_a point_b point_c point_d point_e EQUAL
# 1 1001    A       1      NA       3       2       4     B
# 2 1002    A       1      NA       2       1      NA     B
# 3 1003    A      NA       2       3      NA       1     C

Note: R version 4.1.2 (2021-11-01)注： R version 4.1.2 (2021-11-01)

Data:数据：

dat <- structure(list(id = c(1001L, 1002L, 1003L, 1001L, 1002L, 1003L, 
1001L, 1002L, 1003L), user = c("A", "A", "A", "B", "B", "B", 
"C", "C", "C"), point_a = c(1, 1, NA, 1, 1, NA, 4, 1, NA), point_b = c(NA, 
NA, 2, NA, NA, NA, 3, NA, 2), point_c = c(3, 2, 3, 3, 2, 3, 3, 
2, 3), point_d = c(2, 1, NA, 2, 1, NA, 2, 1, NA), point_e = c(4, 
NA, 1, 4, NA, NA, 4, NA, 1)), class = "data.frame", row.names = c(NA, 
-9L))

如何比较 R 中的多行？

问题描述

2 个解决方案

解决方案1
0 2020-08-02 17:27:06

解决方案2
0 2021-12-26 16:09:14

如何比较 R 中的多行？

问题描述

2 个解决方案

解决方案1 0 2020-08-02 17:27:06

解决方案2 0 2021-12-26 16:09:14

解决方案1
0 2020-08-02 17:27:06

解决方案2
0 2021-12-26 16:09:14