通过R中数据帧中的记录匹配矢量值

Question

I have a vector of values r as follows: 我有一个值为r的向量如下：

 r<-c(1,3,4,6,7)

and a data frame df with 20 records and two columns: 和一个包含20条记录和两列的数据框df ：

 id<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,15,16,17,18,19,20)
 freq<-c(1,3,2,4,5,6,6,7,8,3,3,1,6,9,9,1,1,4,3,7,7)
 df<-data.frame(id,freq)

Using the r vector I need to extract a sample of records (in the form of a new data frame) from df in a way that the freq values of the records, would be equal to the values I have in my r vector. 使用r向量我需要从df中提取记录样本（以新数据帧的形式），其方式是记录的freq值等于我在r向量中的值。 Needless to say that if it finds multiple records with the same freq values it should randomly pick one of them. 不用说，如果它找到具有相同freq值的多个记录，它应该随机选择其中一个。 For instance one possible outcome can be: 例如，一个可能的结果可能是：

   id     frequency
   12         1
   10         3
   4          4
   7          6
   8          7

I would be thankful if anyone could help me with this. 如果有人能帮助我，我将感激不尽。

Answer 1

You could try data.table 你可以试试data.table

library(data.table)
setDT(df)[freq %in% r,sample(id,1L) , freq]

Or using base R 或使用base R

aggregate(id~freq, df, subset=freq %in% r, FUN= sample, 1L)

Update 更新

If you have a vector "r" with duplicate values and want to sample the data set ('df') based on the length of unique elements in 'r' 如果你有一个带有重复值的向量“r”，并希望根据'r'中唯一元素的长度对数据集（'df'）进行采样

  r <-c(1,3,3,4,6,7)
  res <- do.call(rbind,lapply(split(r, r), function(x) {
           x1 <- df[df$freq %in% x,]
           x1[sample(1:nrow(x1),length(x), replace=FALSE),]}))
  row.names(res) <- NULL

Answer 2

You can use filter and sample_n from "dplyr": 您可以使用“dplyr”中的filter和sample_n ：

library(dplyr)
set.seed(1)
df %>% 
  filter(freq %in% r) %>% 
  group_by(freq) %>% 
  sample_n(1)
# Source: local data frame [5 x 2]
# Groups: freq
# 
#   id freq
# 1 12    1
# 2 10    3
# 3 17    4
# 4 13    6
# 5  8    7

Answer 3

Have you tried using the match() function or %in% ? 您是否尝试过使用match()函数或%in% ？ This might not be a fast/clean solution, but uses only base R functions: 这可能不是一个快速/干净的解决方案，但只使用base R函数：

rUnique <- unique(r)
df2 <- df[df$freq %in% rUnique,]
x <- data.frame(id = NA, freq = rUnique) 

for (i in 1:length(rUnique)) {
    x[i,1] <- sample(df2[df2[, 2] == rUnique[i], 1], 1)
}
print(x)

通过R中数据帧中的记录匹配矢量值

问题描述

3 个解决方案

解决方案1
6 已采纳 2015-05-01 14:33:47

Update 更新

解决方案2
4 2015-05-01 14:25:14

解决方案3
1 2015-05-01 14:39:43

通过R中数据帧中的记录匹配矢量值

问题描述

3 个解决方案

解决方案1 6 已采纳 2015-05-01 14:33:47

Update 更新

解决方案2 4 2015-05-01 14:25:14

解决方案3 1 2015-05-01 14:39:43

解决方案1
6 已采纳 2015-05-01 14:33:47

解决方案2
4 2015-05-01 14:25:14

解决方案3
1 2015-05-01 14:39:43