[英]Matching vector values by records in a data frame in R
I have a vector of values r
as follows: 我有一个值为
r
的向量如下:
r<-c(1,3,4,6,7)
and a data frame df
with 20 records and two columns: 和一个包含20条记录和两列的数据框
df
:
id<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,1,4,15,16,17,18,19,20)
freq<-c(1,3,2,4,5,6,6,7,8,3,3,1,6,9,9,1,1,4,3,7,7)
df<-data.frame(id,freq)
Using the r
vector I need to extract a sample of records (in the form of a new data frame) from df
in a way that the freq
values of the records, would be equal to the values I have in my r
vector. 使用
r
向量我需要从df
中提取记录样本(以新数据帧的形式),其方式是记录的freq
值等于我在r
向量中的值。 Needless to say that if it finds multiple records with the same freq
values it should randomly pick one of them. 不用说,如果它找到具有相同
freq
值的多个记录,它应该随机选择其中一个。 For instance one possible outcome can be: 例如,一个可能的结果可能是:
id frequency
12 1
10 3
4 4
7 6
8 7
I would be thankful if anyone could help me with this. 如果有人能帮助我,我将感激不尽。
You could try data.table
你可以试试
data.table
library(data.table)
setDT(df)[freq %in% r,sample(id,1L) , freq]
Or using base R
或使用
base R
aggregate(id~freq, df, subset=freq %in% r, FUN= sample, 1L)
If you have a vector "r" with duplicate values and want to sample the data set ('df') based on the length of unique elements in 'r' 如果你有一个带有重复值的向量“r”,并希望根据'r'中唯一元素的长度对数据集('df')进行采样
r <-c(1,3,3,4,6,7)
res <- do.call(rbind,lapply(split(r, r), function(x) {
x1 <- df[df$freq %in% x,]
x1[sample(1:nrow(x1),length(x), replace=FALSE),]}))
row.names(res) <- NULL
You can use filter
and sample_n
from "dplyr": 您可以使用“dplyr”中的
filter
和sample_n
:
library(dplyr)
set.seed(1)
df %>%
filter(freq %in% r) %>%
group_by(freq) %>%
sample_n(1)
# Source: local data frame [5 x 2]
# Groups: freq
#
# id freq
# 1 12 1
# 2 10 3
# 3 17 4
# 4 13 6
# 5 8 7
Have you tried using the match()
function or %in%
? 您是否尝试过使用
match()
函数或%in%
? This might not be a fast/clean solution, but uses only base R
functions: 这可能不是一个快速/干净的解决方案,但只使用
base R
函数:
rUnique <- unique(r)
df2 <- df[df$freq %in% rUnique,]
x <- data.frame(id = NA, freq = rUnique)
for (i in 1:length(rUnique)) {
x[i,1] <- sample(df2[df2[, 2] == rUnique[i], 1], 1)
}
print(x)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.