重复值的数据框

Question

我目前正在使用R。我有一个数据框，其中包含三个名称，每列一个： year1 ， year2和year3 。 每列都有一组数字数据。

我想有一个产生的数据帧，其中包括了在两个不同的列重复的数据，即：如果num.4在重复year1和year2新的数据帧具有num.4 ，以同样的方式，如果num.5重复在year2和year3新的数据帧已num.5包括在内。

我尝试了以下代码：

newdf1 <- origdf[origdf$year1 == origdf$year2 | origdf$year1 == origdf$year3, c(1)]

newdf2 <- origdf[origdf$year2 == origdf$year3, c(2)]

然后合并两个数据帧，但并未包括所有数据，它包含许多NA值。

然后，我尝试了以下代码：

newdf <- origdf[origdf$year1 == origdf$year2 | origdf$year1 == origdf$year3 & origdf$year2 == origdf$year3, c(1, 2)]

但这也没有用，它给了我一个结果数据帧，其中包含许多NA值和一些正确的值，但并非所有重复的数字都包括在内。

如何有效地包含一个包含在原始数据帧的三个不同列中的恰好是重复的值的数据框，而没有重复的值（我不想在第三个列的所有列中都包含一个重复的数字）原始数据帧）？

预期结果将是：

>newdf

1 num.4
2 num.5

Answer 1

如果我以正确的方式理解，您正在寻找数据框的各列之间的交集，但应排除 所有这三列共有的元素。 然后， intersect()函数可能是一个解决方案。 代码看起来像这样

n_years <- 3
# generate all possible combinations of two indices of considered years
indices_comb <- combn(x = 1:n_years, m = 2)
# apply intersect() along all possible combinations
all_intersects <- sapply(function(i) intersect(origdf[, indices_comb[1, i]], 
    origdf[, indices_comb[2, i]]), X = 1:ncol(indices_comb))

精细地，排除所有原始列（ year1 ， year2 ， year3 ） year2 year3 ：

# find elements which are common for all pairwise intersections
in_all <- Reduce(intersect, all_intersects)
# combine all pairwise intersections into one vector
in_pairw <- Reduce(all_intersects, f = c)
# exclude the elements which are common for all intersections
newdf <- data.frame(res = setdiff(in_pairw, in_all))

上述解决方案可以轻松缩放为任意数量的原始列（年）。 但请注意，仅返回唯一的组合。 也就是说， num.4在year1和year2中都出现两次，只返回一个num.4 。

重复值的数据框

问题描述

1 个解决方案

解决方案1
0 2018-02-26 11:32:51

重复值的数据框

问题描述

1 个解决方案

解决方案1 0 2018-02-26 11:32:51

解决方案1
0 2018-02-26 11:32:51