简体   繁体   English

R:将向量中的值与数据帧中的列进行比较

[英]R: Comparing values in vector to column in data frame

Apologies if this has been asked before, but I've searched for a while and can't find anything to answer my question. 抱歉,是否曾经有人问过我,但我搜索了一段时间,找不到任何答案。 I'm somewhat comfortable using R but never really learned the fundamentals. 我对使用R感到有些自在,但从未真正学习过基础知识。 Here's what I'm trying to do. 这就是我想要做的。

I've got a vector (call it "responseTimes") that looks something like this: 我有一个看起来像这样的向量(称为“ responseTimes”):

150  50 250  200  100  150  250  

(It's actually much longer, but I'm truncating it here.) (实际上要长得多,但是我在这里将其截断。)

I've also got a data frame where one column, timeBin, is essentially counting up by 50 from 0 (so 0 50 100 150 200 250 etc). 我也有一个数据帧,其中一列timeBin实际上从0开始计数50(所以0 50 100 150 200 250等)。

What I'm trying to do is to count how many values in responseTimes are less than or equal to each row in the data frame. 我想做的是计算出responseTimes中有多少个值小于或等于数据帧中的每一行。 I want to store these counts in a new column of my data frame. 我想将这些计数存储在数据框的新列中。 My output should look something like this: 我的输出应如下所示:

timeBin    counts
0          0
50         1
100        2
150        4
200        5
250        7

I know I can use the sum function to compare vector elements to some constant (eg, sum(responseTimes>100) would give me 5 for the data I've shown here) but I don't know how to do this to compare to a changing value (that is, to compare to each row in the timeBin column). 我知道我可以使用sum函数将向量元素与某个常数进行比较(例如,sum(responseTimes> 100)将为我在此处显示的数据提供5分),但是我不知道如何进行比较一个变化的值(即与timeBin列中的每一行进行比较)。

I'd prefer not to use a loop, as I'm told those can be particularly slow in R and I have quite a large data set that I'm working with. 我不希望使用循环,因为有人告诉我R循环可能特别慢,并且我正在使用相当大的数据集。 Any suggestions would be much appreciated! 我们欢迎所有的建议! Thanks in advance. 提前致谢。

You can use sapply this way: 您可以通过以下方式使用sapply

> timeBin <- seq(0, 250, by=50)
> responseTimes <- c(150,  50, 250,  200,  100,  150,  250 )
> 
> # using sapply (after all `sapply` is a loop)
> ans <- sapply(timeBin, function(x)  sum(responseTimes<=x))
> data.frame(timeBin, counts=ans)  # your desired output.
  timeBin counts
1       0      0
2      50      1
3     100      2
4     150      4
5     200      5
6     250      7

That might help: 这可能会有所帮助:

responseTimes <- c(150, 50, 250, 200, 100, 150, 250)
bins1 <- seq(0, 250, by = 50)


sahil1 <- function(input = responseTimes, binsx = bins1) {
    tablem <- table(cut(input, binsx)) # count of input across bins
    tablem <- cumsum(tablem) # cumulative sums
    return(as.data.frame(tablem)) # table to data frame
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM