如何从R中的数据框中删除重复项

Question

I have a data frame of correlation coefficients like the following. 我有一个相关系数的数据框，如下所示。 In the data frame it has correlation coefficients of a*b and b*a which are the same. 在数据帧中，其相关系数为a*b和b*a相同。 How do I remove this duplicates? 如何删除这些重复项？ Can anyone please help 谁能帮忙

**Var1, Var2, r**
ApoA1.ng.ml.1, Apo.B.ng.ml, 0.9998438
Apo.B.ng.ml, ApoA1.ng.ml.1, 0.9998438
SLM.T0., TBW.T0., 0.9992563
TBW.T0., SLM.T0., 0.9992563
Insulin.mercdiaConc..U.L, Insulin..pg.ml, 0.9313702
Insulin..pg.ml, Insulin.mercdiaConc..U.L, 0.9313702

Answer 1

We could try using the sqldf package here: 我们可以在这里尝试使用sqldf包：

library(sqldf)
sql <- "SELECT MIN(Var1, Var2), MAX(Var2, Var1), MAX(r) AS R
        FROM df
        GROUP BY MIN(Var1, Var2), MAX(Var2, Var1)"

df_out <- sqldf(sql)

Demo 演示

Answer 2

If the other techniques don't quite work, you can use temporary min/max strings and de- duplicated from those: 如果其他技术不太奏效，则可以使用临时的最小/最大字符串并从中duplicated ：

x <- read.csv(stringsAsFactors=FALSE, text="
Var1,Var2,r
ApoA1.ng.ml.1,Apo.B.ng.ml,0.9998438
Apo.B.ng.ml,ApoA1.ng.ml.1,0.9998438
SLM.T0.,TBW.T0.,0.9992563
TBW.T0.,SLM.T0.,0.9992563
Insulin.mercdiaConc..U.L,Insulin..pg.ml,0.9313702
Insulin..pg.ml,Insulin.mercdiaConc..U.L,0.9313702")

x[!duplicated(pmin(x$Var1, x$Var2),pmax(x$Var1, x$Var2)),]
#                       Var1           Var2         r
# 1            ApoA1.ng.ml.1    Apo.B.ng.ml 0.9998438
# 3                  SLM.T0.        TBW.T0. 0.9992563
# 5 Insulin.mercdiaConc..U.L Insulin..pg.ml 0.9313702

(You can also assign them temporarily to columns in the frame, ala （您也可以将它们临时分配给框架中的列，

x$m1 <- pmin(x$Var1, x$Var2)
x$m2 <- pmax(x$Var1, x$Var2)
x[!duplicated(x[c("m1","m2")]),]

though you then have to remove the temp variables yourself.) 不过您必须自己删除临时变量。）

如何从R中的数据框中删除重复项

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-12-04 05:59:32

Demo 演示

解决方案2
2 2018-12-04 06:03:09

如何从R中的数据框中删除重复项

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-12-04 05:59:32

Demo 演示

解决方案2 2 2018-12-04 06:03:09

解决方案1
2 已采纳 2018-12-04 05:59:32

解决方案2
2 2018-12-04 06:03:09