[英]Random sampling over XY coordinates in R (or in Matlab ??)
My data frame has the following four columns: type("A" or "B"), xvar, longitude, and latitude. 我的数据框有以下四列:类型(“A”或“B”),xvar,经度和纬度。 It looks like: 看起来像:
type xvar longitude latitude
[1,] A 20 -87.81 40.11
[2,] A 12 -87.82 40.12
[3,] A 50 -87.85 40.22
....
[21,] B 24 -87.79 40.04
[22,] B 30 -87.88 40.10
[23,] B 12 -87.67 40.32
[24,] B 66 -87.66 40.44
....
I have 20 rows for type="A", and 25,000 rows for type="B". 我有20行type =“A”,25,000行type =“B”。 My task is to randomly assign the values of xvar for 20 "A" data points onto the XY space of type "B" without replacement. 我的任务是将20个“A”数据点的xvar值随机分配到“B”类型的XY空间而不进行替换。 For example, the xvar=20 as in the first observation of type="A" can be randomly located in [22,] that is (-87.88,40.10) . 例如,类型=“A”的第一次观察中的xvar = 20可以随机地位于[22,]中,即(-87.88,40.10)。 Because I am doing that without replacement, in theory, I can do this replication 25,000/20 = 1,250 times. 因为我在没有替换的情况下这样做,理论上,我可以执行此复制25,000 / 20 = 1,250次。 I want a 1,000 replication. 我想要1000复制。
And I have a function (say, myfunc(xvar,longitude,latitude)) that returns one statistical value from one randome sample. 我有一个函数(比如,myfunc(xvar,经度,纬度)),它从一个randome样本中返回一个统计值。 I first create an empty matrix (say, myresult) of 1,000x1. 我首先创建一个1,000x1的空矩阵(比如,myresult)。
myresult <- array(0,dim=c(1000,1))
Then, for each random sample, I apply my function (myfunc) to calculate the statistic. 然后,对于每个随机样本,我应用我的函数(myfunc)来计算统计量。
for (i in seq(1:1000)) {
draw one sample, that has three variables: xvar, longitude, latitude.
apply my function to this selected sample.
store the calculated statistic in the myresult[i,]
}
I wonder how to do this in R. (And may be in Matlab??) Thanks! 我想知道如何在R中做到这一点(并且可能在Matlab中?)谢谢!
============================================================= ================================================== ===========
Update: @user. 更新:@user。 Borrowing your idea, the following is what I want: 借用你的想法,以下是我想要的:
dd1 <- df[df$type == "B" ,]
dd2 <- df[df$type == "A" ,]
v <- dd2[sample(nrow(dd2), nrow(dd2)), ]
randomXvarOfA <- as.matrix(v[,c("xvar")])
cols <- c("longitude","latitude")
B_shuffled_XY <- dd1[,cols][sample(nrow(dd1), nrow(dd2)), ]
dimnames(randomXvarOfA)=list(NULL,c("xvar"))
sampledData <- cbind(randomXvarOfA,B_shuffled_XY)
sampledData
xvar longitude latitude
4 20 -87.79 40.04
7 12 -87.66 40.44
5 50 -87.88 40.10
I think the function you're looking for is the 'sample' function. 我认为您正在寻找的功能是'示例'功能。 It would work something like this (using your looping approach): 它会像这样工作(使用循环方法):
drawn_Sample <- sample(21:25000, 20000, rep=FALSE)
myresult <- integer(1000)
for (i in seq(1:1000){
index_Values <- (1 + (i-1)*20):(20 + (i-1)*20))
myresult[i] <- myfun(my_Data$xvar[1:20], my_Data$longitude[drawn_Sample[index_Values]], my_Data$latitude[drawn_Sample[index_Values]])
}
In this case, I am randomly assigning rows 1:20 (the one's with value "A") to groups of twenty randomly chosen rows 21:25000 and then applying the function across the groupings. 在这种情况下,我随机将行1:20(值为“A”的行)分配给20个随机选择的行21:25000的组,然后在分组中应用该函数。
This feels a bit needlessly complicated, and I think we could condense it all down if we knew a little more about your function ('myfun'). 这感觉有点不必要的复杂,如果我们对你的功能有更多的了解('myfun'),我想我们可以把它压缩。 I'm assuming it's vectorized. 我假设它是矢量化的。
Update : At the OP's request, I am adding how to modify this answer to suit data frames that are not so easily sorted. 更新 :根据OP的要求,我将添加如何修改此答案以适应不那么容易排序的数据帧。
repetitions <- 1000 # Change this as necessary
A_data <- my_Data[my_Data$type=="A",]
B_data <- my_Data[my_Data$type=="B",]
A_rows <- nrow(A_data)
B_rows <- nrow(B_data)
drawn_Sample <- sample(1:B_rows, repetitions * A_rows, rep=FALSE)
myresult <- integer(repetitions)
for (i in seq(1:repetitions){
index_Values <- (1 + (i-1)*A_rows):(A_rows + (i-1)*A_rows))
myresult[i] <- myfun(A_data$xvar, B_data$longitude[drawn_Sample[index_Values]], B_data$latitude[drawn_Sample[index_Values]])
}
Read in your data: 读入您的数据:
df<- read.table( text="
type xvar longitude latitude
A 20 -87.81 40.11
A 12 -87.82 40.12
A 50 -87.85 40.22
B 24 -87.79 40.04
B 30 -87.88 40.10
B 12 -87.67 40.32
B 66 -87.66 40.44", header = TRUE)
I was writing this without splitting and it looked so messy. 我写这篇文章没有分裂,看起来很混乱。 So I decided just to split your data.frame
. 所以我决定只分割你的data.frame
。
dd1 <- df[df$type == "B" ,] # get all rows of just type A
dd2 <- df[df$type == "A" ,] # get all rows of just type B
v <- dd2[sample(nrow(dd2), 2), ] #sample two rows at random that are type A
# if you want to sample 20 rows change the 2 to a 20
cols <- c("longitude", "latitude")
dd1[,cols][sample(nrow(dd1), 2), ] <- v[,cols]
#Add the random long/lat selected from type As into 2 random long/lat of B
# put the As and Bs back together
rbind(dd2,dd1)
# type xvar longitude latitude
# 1 A 20 -87.81 40.11
# 2 A 12 -87.82 40.12
# 3 A 50 -87.85 40.22
# 4 B 24 -87.79 40.04
# 5 B 30 -87.85 40.22
# 6 B 12 -87.81 40.11
# 7 B 66 -87.66 40.44
As you can see rows 5 and 6 of B have new randomly selected lat and long values from A types. 如您所见,B的第5行和第6行具有来自A类型的新随机选择的lat和long值。 I did not change the xvar
values though. 我没有更改xvar
值。 I don't know if you want this. 我不知道你是否想要这个。 If you did want to change the xvars
too then you would change cols
to cols <- c("xvar","longitude", "latitude")
. 如果您确实想要更改xvars
则可以将cols
更改为cols <- c("xvar","longitude", "latitude")
。
Inside a function it would look like: 在函数内部,它看起来像:
changestuff <- function(x){
dd1 <- x[x$type == "B" ,] # get just A
dd2 <- x[x$type == "A" ,] # get just B
v <- dd2[sample(nrow(dd2), 2), ]
cols <- c("longitude", "latitude")
dd1[,cols][sample(nrow(dd1), 2), ] <- v[,cols]
rbind(dd2,dd1)
}
changestuff(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.