简体   繁体   English

如何比较 R 中的多行?

[英]How can I compare multiple rows in R?

I would like to compare multiple values by USER.我想按 USER 比较多个值。

Based on USER "A", If the values (A,B,C,D,and E) are same with USER "B", it should be written as B at the newly created variable EQUAL基于USER“A”,如果值(A,B,C,D,和E)与USER“B”相同,则应在新创建的变量EQUAL处写为B

Here is my data这是我的数据

在此处输入图像描述

Desired value期望值

在此处输入图像描述

I am very new to R, I tried to look at the compare function but got a little overwhelmed.我是 R 的新手,我试图查看比较 function 但有点不知所措。 Would very much appreciate any help.非常感谢任何帮助。

Here's an abridged version of the data you provided:这是您提供的数据的简化版本:

library(tidyverse)

df <- data.frame(
  id = c(1001, 1002, 1003, 1001, 1002, 1003),
  user = c('a', 'a', 'a', 'b', 'b', 'b'),
  point_a = c(1, 1, NA, 1, 1, NA),
  point_b = c(NA, NA, 2, NA, NA, NA),
  point_c = c(3, 2, 3, 3, 2, 3),
  point_d = c(2, 1, NA, 2, 1, NA),
  point_e = c(4, NA, 1, 4, NA, NA)
)

df

    id user point_a point_b point_c point_d point_e
1 1001    a       1      NA       3       2       4
2 1002    a       1      NA       2       1      NA
3 1003    a      NA       2       3      NA       1
4 1001    b       1      NA       3       2       4
5 1002    b       1      NA       2       1      NA
6 1003    b      NA      NA       3      NA      NA

If you inner_join on the columns you want to match, and then filter for rows where user.x is greater than user.y (ie first in alphabetical order, to get rid of duplicates and rows matching to themselves), you should be left with the matches you're looking for:如果您在要匹配的列上进行inner_join ,然后filter user.x大于user.y的行(即首先按字母顺序排列,以消除重复项和与自身匹配的行),您应该留下您正在寻找的比赛:

df %>%
  inner_join(df, by = c('point_a', 'point_b', 'point_c', 'point_d', 'point_e')) %>%
  filter(user.x < user.y) %>%
  rename(user = user.x,
         equal = user.y)

  id.x user point_a point_b point_c point_d point_e id.y equal
1 1001    a       1      NA       3       2       4 1001     b
2 1002    a       1      NA       2       1      NA 1002     b

We may split the data along users, and put the result in mapply and calculate the rowSums of TRUE s after comparison with `==` .我们可以把数据按users split ,然后把结果放到mapply中,和`==`比较后计算TRUErowSums From the resulting matrix we want to know which.max which allows us to subset the users (without "A" ).从生成的矩阵中,我们想知道which.max允许我们对用户进行子集化(没有"A" )。 The result just needs to be subsetted by user "A" .结果只需要由用户"A"进行子集化。

transform(dat, EQUAL=
            split(dat, dat$user) |>
            (\(.) mapply(\(x, y) rowSums(x == y, na.rm=TRUE), 
                         unname(.['A']), 
                         .[c('B', 'C')]))() |>
            (\(.) sort(unique(dat$user))[-1][apply(., 1, which.max)])()
) |>
  (\(.) .[.$user == 'A', ])()
#     id user point_a point_b point_c point_d point_e EQUAL
# 1 1001    A       1      NA       3       2       4     B
# 2 1002    A       1      NA       2       1      NA     B
# 3 1003    A      NA       2       3      NA       1     C

Note: R version 4.1.2 (2021-11-01)注: R version 4.1.2 (2021-11-01)


Data:数据:

dat <- structure(list(id = c(1001L, 1002L, 1003L, 1001L, 1002L, 1003L, 
1001L, 1002L, 1003L), user = c("A", "A", "A", "B", "B", "B", 
"C", "C", "C"), point_a = c(1, 1, NA, 1, 1, NA, 4, 1, NA), point_b = c(NA, 
NA, 2, NA, NA, NA, 3, NA, 2), point_c = c(3, 2, 3, 3, 2, 3, 3, 
2, 3), point_d = c(2, 1, NA, 2, 1, NA, 2, 1, NA), point_e = c(4, 
NA, 1, 4, NA, NA, 4, NA, 1)), class = "data.frame", row.names = c(NA, 
-9L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何比较 R 中的 ONE dataframe 的行? - How can I compare rows of ONE dataframe in R? 如何在R中按多列比较行? - How to compare rows by multiple columns in R? 如何在 R 的同一个数据库中将一个字符串分成多行? - How can I separate a string into multiple rows in the same database in R? 如何合并R中行不相等的多个文件 - How Can I Merge Multiple Files with Unequal Rows in R 如何按不同的列比较 R 数据帧中的两行并对它们执行操作? - How can I compare two rows in R data frame by different columns and perform an operation on them? 如何让LSAfun比较R中的两行数据? - How do I get LSAfun to compare two rows of data in R? 如何根据条件将 R dataframe 行拆分为多行? - How can I split R dataframe rows into multiple rows based on a condition? 如何根据 R 中另一列中的空行将多行合并为一行? - How can I collapse multiple rows into one based on empty rows in another column in R? 如何将每个单元格的多个值替换为 R 中多个列和行的每个单元格的平均值 - How can I replace multiple values per cell with a mean per cell for multiple columns and rows in R R-如何在不使用sqldf的情况下从data.frame的多个列中删除行? - R - How can I remove rows from multiple columns in a data.frame without using sqldf?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM