简体   繁体   English

R中数据框行中的子集

[英]Subset in the data frame rows in R

I have a data frame with 30 rows and 4 columns (namely, x, y, z, u ).我有一个 30 行 4 列的数据框(即x, y, z, u )。 It is given below.下面给出。

mydata = data.frame(x = rnorm(30,4), y = rnorm(30,2,1), z = rnorm(30,3,1), u = rnorm(30,5))

Further, I have a sequence values, which represent row number in my data frame.此外,我有一个序列值,它表示我的数据框中的行号。

myseq = c(seq(1, 30, by = 5))
myseq
[1]  1  6 11 16 21 26

Now, I wanted to compute the prob values for each segment of 99 rows.现在,我想计算每个 99 行段的prob值。

filt= subset(mydata[1:6,], mydata[1:6,]$x < mydata[1:6,]$y & mydata[1:6,]$z < mydata[1:6,]$u
filt
prob = length(filt$x)/30
prob

Then I need to compute the above prob for 1:6 ,.., 27:30 and so on .然后我需要计算1:6 ,.., 27:30等的上述prob Here, I have only 6 prob values.在这里,我只有 6 个prob值。 So, I can do one by one.所以,我可以一一做。 If I have 100 values it would be tedious.如果我有 100 个值,那会很乏味。 Are there any way to compute the prob values?.有没有办法计算prob值?

Thank you in advance.先感谢您。

BTW: in subset(DF[1:99,], ...) , use DF[1:99,] in the first argument, not again, ala顺便说一句:在subset(DF[1:99,], ...) ,在第一个参数中使用DF[1:99,] ,不再重复,ala

subset(DF[1:99,], cumsuml < inchivaluel & cumsumr < inchivaluer)

Think about how to do this in a list .考虑如何在list执行此操作。

  1. The first step is to break your data into the va starting points.第一步是将您的数据分解为va起点。 I'll start with a list of the indices to break it into:我将从索引列表开始,将其分解为:

     inds <- mapply(seq, va, c(va[-1], nrow(DF)), SIMPLIFY=FALSE)

    this now is a list of sequences, starting with 1:99 , then 100:198 , etc. See str(inds) to verify.这现在是一个序列列表,从1:99开始,然后是100:198等。请参阅str(inds)进行验证。

  2. Now we can subset a portion of the data based on each element's vector of indices:现在我们可以根据每个元素的索引向量对数据的一部分进行子集化:

     filts <- lapply(inds, function(ind) subset(DF[ind,], cumsuml < inchivaluel & cumsumr < inchivaluer))
  3. We now have a list of vectors, let's summarize it:我们现在有一个向量列表,让我们总结一下:

     results <- sapply(filts, function(filt) length(filt$cumsuml)/length(alpha))

Bottom line, it helps to think about how to break this problem into lists, examples at http://stackoverflow.com/a/24376207/3358272 .最重要的是,考虑如何将这个问题分解为列表会有所帮助,例如http://stackoverflow.com/a/24376207/3358272

BTW: instead of initially making a list of indices, we could just break up the data in that first step, ala顺便说一句:不是最初制作索引列表,我们可以在第一步中分解数据,ala

DF2 <- mapply(function(a,b) DF[a:b,], va, c(va[-1], nrow(DF)), SIMPLIFY=FALSE)
filts <- lapply(DF2, function(x) subset(x, cumsuml < inchivaluel & cumsumr < inchivaluer))
results <- sapply(filts, function(filt) length(filt$cumsuml)/length(alpha))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM