简体   繁体   English

如何使用基于现有数据框的值创建新数据框,并使用R中的数字向量创建范围。

[英]How to create a new dataframe with values based on existing data frame and range from numeric vectors in R.

I have a 96 x 48 dataframe df. 我有一个96 x 48的数据帧df。 The first column is an identifying field (char), columns 2 - 48 are numeric values. 第一列是标识字段(char),第2-48列是数值。 I also have two numeric vectors with 96 elements each, consisting of upper and lower bounds that correspond to each row. 我还有两个数字向量,每个向量有96个元素,由对应每行的上限和下限组成。

I would like to create a new dataframe with an identical column 1, but for columns 2-48 I would like to see if the value is between the values in the two vectors for each row. 我想创建一个具有相同列1的新数据帧,但是对于列2-48,我想看看该值是否在每行的两个向量中的值之间。 Then I'd like to have 1 in the new data frame if it is, 0 if it is not (a boolean of sorts). 然后我想在新数据框中有1,如果是,如果不是,则为0(排序的布尔值)。

example: 例:

df: DF:

1  2  3  4 .. 48
    a  7  11 15   58
    b  6  9  13   46
    c  8  14 20   73

vectors: 向量:

upper: 24, 35, 22, 63
    lower: 10, 11, 12, 11

return: 返回:

1  2  3  4 .. 48  
    a  0  1  1    0   (between upper[1] and lower[1])
    b  0  0  1    0   (between upper[2] and lower[2])
    c  0  1  1    0   ...

I'd like to do this without a loop since I'm pretty sure there's a way to do this, but I can't seem to find it. 我想在没有循环的情况下这样做,因为我很确定有办法做到这一点,但我似乎无法找到它。

One method using dplyr: 使用dplyr的一种方法:

# Data
df <- data.frame(id=letters[1:3], col2=c(7,6,8), col3=c(11,9,14), col4=c(15,13,20), col48=c(58,46,73))

# chain of operations
library(dplyr)
df %>%
  mutate(upper = c(24, 35, 22), lower = c(10, 11, 12)) %>%
  mutate_at(paste0("col", c(2:4, 48)), funs(.>=lower & .<=upper)) %>%
  mutate_at(paste0("col", c(2:4, 48)), as.integer) %>%
  select(-lower, -upper)

Output: 输出:

  col1 col2 col3 col4 col48
1    a    0    1    1     0
2    b    0    0    1     0
3    c    0    1    1     0

since you said that the other variables are numeric, then we can do: 因为你说其他变量是数字的,那么我们可以这样做:

ifelse(t(upper.bounds-t(df[-1])>0&lower.bounds-t(df[-1])<0),1,0)
     c2 c3 c4 c48
[1,]  0  0  1   0
[2,]  0  0  1   0
[3,]  0  1  1   0

There is no need of lapply or forloop where the data: 没有lapplyforloop的数据:

df=read.table(text=" c1  c2  c3  c4 c48
    a  7  11 15   58
            b  6  9  13   46
            c  8  14 20   73 
            ",h=T)

You can avoid an explicit for loop by using an implicit loop via lappy that loops over all columns. 您可以通过遍历所有列的lappy使用隐式循环来避免显式的for循环。 I think that loop is not critical from a performance point-of-view if you loop over the columns but only if you loop over the rows (since R stores the elements of a column as vector in continuous memory locations so that the performance is optimal but the elements of each row are spreaded over the memory locations which causes a performance penalty to loop over rows 1 by 1): 我认为如果循环遍历列,那么从性能的角度来看循环并不重要,但只有在循环遍历行时(因为R将列的元素作为向量存储在连续的内存位置,以便性能最佳)但是每行的元素都在内存位置上扩展,这会导致性能下降,从而逐行遍历行1):

df <- data.frame(c1 = c(7, 6, 8), c2 = c(11, 9, 14), c3 = c(15, 13, 20), c48 = c(58, 46, 73))
df

lower.bounds <- c(10, 11, 12) # , 11)
upper.bounds <- c(24, 35, 22) # , 63)

res <- lapply(df, function(col) {ifelse(col >= lower.bounds & col <= upper.bounds, 1, 0)})
as.data.frame(res)
# c1 c2 c3 c48
# 1  0  1  1   0
# 2  0  0  1   0
# 3  0  1  1   0

Another option is to just use apply over columns. 另一个选择是使用apply over columns。 I think it is pretty simple and clean. 我认为这很简单干净。

df <- data.frame(V2=c(7,6,8), V3=c(11,9,14), V4=c(15,13,20), V48=c(58,46,73))

upper <- c(24, 35, 22)
lower <- c(10, 11, 12)

data.frame(apply(df,2,function(x)((upper>=x)*(x>=lower))))
  V2 V3 V4 V48
  1  0  1  1   0
  2  0  0  1   0
  3  0  1  1   0

EDIT: After MKR comment, I became curious and had to test performance. 编辑: MKR评论后,我变得好奇,不得不测试性能。 If there is any suggestion on how to measure it in a different way, please comment. 如果有任何关于如何以不同方式衡量它的建议,请发表评论。

df <- data.frame(V2=c(7,6,8), V3=c(11,9,14), V4=c(15,13,20), V48=c(58,46,73))

upper <- c(24, 35, 22)
lower <- c(10, 11, 12)

 start.time <- Sys.time()
 data.frame(apply(df,2,function(x)((upper>=x)*(x>=lower))))
  #V2 V3 V4 V48
  #1  0  1  1   0
  #2  0  0  1   0
  #3  0  1  1   0
 Sys.time()-start.time
  #Time difference of 0.0146079 secs

 start.time <- Sys.time()
 data.frame(apply(df,2,function(x)(as.numeric((upper>=x)&(x>=lower)))))
  #V2 V3 V4 V48
  #1  0  1  1   0
  #2  0  0  1   0
  #3  0  1  1   0
 Sys.time()-start.time
  #Time difference of 0.0124476 secs

 start.time <- Sys.time()
 data.frame(ifelse(upper > df[] & lower < df[], 1, 0))
  #V2 V3 V4 V48
  #1  0  1  1   0
  #2  0  0  1   0
  #3  0  1  1   0
 Sys.time()-start.time
  #Time difference of 0.008914948 secs

Another possible simpler solution could be: 另一种可能的简单解决方案可能是:

    df <- data.frame(c1 = c(7, 6, 8), 
                     c2 = c(11, 9, 14), 
                     c3 = c(15, 13, 20), 
                     c48 = c(58, 46, 73))

    lower.bounds <- c(10, 11, 12)
    upper.bounds <- c(24, 35, 22)

    ifelse(upper.bounds > df[] & lower.bounds < df[], 1, 0)
  # Result:
  #       c1 c2 c3 c48
  #  [1,]  0  1  1   0
  #  [2,]  0  0  1   0
  #  [3,]  0  1  1   0

OR 要么

    as.data.frame(ifelse(upper.bounds > df[] & lower.bounds < df[], 1, 0))
  # Result:
  # 
  #    c1 c2 c3 c48
  #  1  0  1  1   0
  #  2  0  0  1   0
  #  3  0  1  1   0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在R中,如何基于数据帧中的值创建多个随机值向量? - In R, how do I create multiple vectors of random values based on values from a data frame? 根据 R 中特定列的数值范围创建新数据框 - Creating a new data frame based on the range of numeric values of a specific column in R R.根据一秒内的值保留一个数据帧中的行 - R. Retaining rows from one data frame based on values in a second 从 R 数据框中的两列创建新的向量列 - Create new column of vectors from two columns in R data frame 根据我的数据框中现有列的值,在 R 中创建一个新列 - Create a new column in R based off of values for an existing column in my data frame 如何从 R 数据框字符串列中提取数值向量并保存为包含向量的列(列表) - How to extract numeric vectors from R data frame string columns and save as columns (lists) with vectors 基于R语言列中的特定值从现有数据中提取新数据帧 - Extract new data frame from existing, based on particular values in a column in R language R.根据不同数据帧中的值设置数据帧中的值 - R. Setting value in data frame based on values in different data frame 根据另一个数据框中的值创建新数据框 - Create new data frame based on values from another data frame 将列表简化为数据框并从列表中的数字向量创建新列 - Simplify a list to a data frame & create new columns from numeric vectors in the list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM