简体   繁体   English

使用R有条件地替换数据框中的列值

[英]Conditional replacement of column values in a dataframe using R

Let's make a dummy dataset 让我们制作一个虚拟数据集

ll = data.frame(rbind(c(2,3,5), c(3,4,6), c(9,4,9)))
colnames(ll)<-c("b", "c", "a")
> ll
  b c a
1 2 3 5
2 3 4 6
3 9 4 9

P = data.frame(cbind(c(3,5), c(4,6), c(8,7)))
colnames(P)<-c("a", "b", "c")
> P
  a b c
1 3 4 8
2 5 6 7

I want to create a new dataframe where the values in each column of ll would be turned into 0 when it is less than corresponding values of a,b, & c in the first row of P; 我想创建一个新的数据帧,当它小于P第一行中的a,b和c的对应值时,ll的每一列中的值都将变为0; in other words, I'd like to see 换句话说,我想看看

> new_ll
  b c a
1 0 0 5
2 0 0 6
3 9 0 9

so I tried it this way 所以我这样尝试了

nn=c("a", "b", "c")
new_ll = sapply(nn, function(i) 
  ll[,paste0(i)][ll[,paste0(i)] < P[,paste0(i)][1]] <- 0)

But it doesn't work for some reason! 但是由于某种原因,它不起作用! I must be doing a silly mistake in my script!! 我一定在脚本中犯了一个愚蠢的错误! Any idea? 任何想法?

> new_ll
a b c 
0 0 0 

You can find the values in ll that are smaller than the first row of P with an apply : 您可以使用applyll中找到小于P第一行的值:

t(apply(ll, 1, function(x) x<P[1,][colnames(ll)]))
      [,1] [,2]  [,3]
[1,]  TRUE TRUE FALSE
[2,]  TRUE TRUE FALSE
[3,] FALSE TRUE FALSE

Here, the first row of P is ordered to match ll , then the elements are compared. 这里, P的第一行被排序为匹配ll ,然后比较元素。

Credit to Ananda Mahto for recognizing that apply is not required: 感谢Ananda Mahto承认不需要apply

ll < c(P[1, names(ll)])
         b    c     a
[1,]  TRUE TRUE FALSE
[2,]  TRUE TRUE FALSE
[3,] FALSE TRUE FALSE

The TRUE values show where you want to substitute with 0: TRUE值显示要替换为0的位置:

ll[ ll < c(P[1, names(ll)]) ] <- 0
ll
  b c a
1 0 0 5
2 0 0 6
3 9 0 9

To fix your code, you want something like this: 要修复您的代码,您需要以下代码:

do.call(cbind, lapply(names(ll), function(i) {
    ll[,i][ll[,i] < P[,i][1]] <- 0
    return(ll[i])}))
  b c a
1 0 0 5
2 0 0 6
3 9 0 9

What's changed? 有什么变化? First, sapply is changed to lapply and the function returns a vector for each iteration. 首先,将sapply更改为lapply ,该函数为每次迭代返回一个向量。 Second, the names are presented in the correct order for the expected results. 其次,以正确的顺序显示名称以达到预期的结果。 Third, the results are put together with cbind to get the final matrix. 第三,将结果与cbind放在一起以获得最终矩阵。 As a bonus, the redundant calls to paste0 have been removed. 另外,删除了对paste0的多余调用。

You could also try mapply , which applies the function to the each corresponding element. 您也可以尝试mapply ,它将功能应用于每个对应的元素。 Here, the ll and P are both data.frames . 在这里, llP都是data.frames So, it applies the function for each column and does the recycling also. 因此,它将功能应用于每个列并也进行回收。 Here, I matched the column names of P with that of ll (similar to @Matthew Lundberg) and looked for which elements of ll in each column is < than the corresponding column (the one row of P gets recycled) and returns a logical index. 在这里,我将Pcolumn namesllcolumn names进行了匹配(类似于@Matthew Lundberg),并查找每列中ll哪些元素<小于对应的列( P一行被回收)并返回逻辑索引。 Then the elements that matches the logical condition are assigned to 0 . 然后,将符合逻辑条件的元素分配给0

indx <- mapply(`<`, ll, P[1,][names(ll)])
new_ll <- ll
new_ll[indx] <- 0
new_ll
 #  b c a
 #1 0 0 5
 #2 0 0 6
 #3 9 0 9

In case you know that ll and P are numeric you can do it also as 如果您知道llP是数字,则也可以执行以下操作

llm <- as.matrix(ll)
pv <- as.numeric(P[1, colnames(llm)])  
llm[sweep(llm, 2, pv, `<=`)] <- 0
data.frame(llm)
#   b c a
# 1 0 0 5
# 2 0 0 6
# 3 9 0 9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM