[英]Replace values in each column based on conditions according to groups (by rows) data.frame
I have a data.frame, dim = 400 rows and 15000 columns. 我有一个data.frame,dim = 400行和15000列。 I would like to apply a condition where for rows belonging to each group, defined by
df$Group
, I have to check if the group has values in more than 50% of the rows. 我想应用一种条件,其中对于属于每个组的行(由
df$Group
定义),我必须检查该组的值是否超过行的50%。 If yes, then keep then existing values, else replace all by 0
. 如果是,则保留现有值,否则全部替换为
0
。
for example, for group a df[1:6,1]
, if sum(df[1:6,1] == 0)/length(df[1:6,1]) >50%
, then all values in df[1:6,1]
will be replace with 0
. 例如,对于组
df[1:6,1]
, if sum(df[1:6,1] == 0)/length(df[1:6,1]) >50%
,则所有值df[1:6,1]
将替换为0
。 Else the existing values will remain. 否则,将保留现有值。
Sample input: 输入样例:
df <- read.table(text= "DATA r1 r2 r3 Group
a1 6835 256 0 a
a2 5395 0 67 a
a3 7746 0 30 a
a4 7496 556 50 a
a5 5780 255 0 a
a6 6060 603 0 a
b1 0 0 0 b
b2 0 258 0 b
b3 0 0 0 b
b4 0 0 0 b
b5 5099 505 0 b
b6 0 680 0 b
c1 8443 4900 280 c
c2 8980 4949 0 c
c3 7828 0 0 c
c4 6509 3257 0 c
c5 6563 0 49 c
", header=TRUE, na.strings=NA,row.name=1)
dt <- as.data.table(df) #or data.frame
Expected output: 预期产量:
>df
DATA r1 r2 r3 Group
a1 6835 256 0 a
a2 5395 0 67 a
a3 7746 0 30 a
a4 7496 556 50 a
a5 5780 255 0 a
a6 6060 603 0 a
b1 0 0 0 b
b2 0 258 0 b
b3 0 0 0 b
b4 0 0 0 b
b5 0 505 0 b
b6 0 680 0 b
c1 8443 4900 0 c
c2 8980 4949 0 c
c3 7828 0 0 c
c4 6509 3257 0 c
c5 6563 0 0 c
Update: This bug, #4957 is now fixed in v1.8.11 . 更新:此错误#4957现在已在v1.8.11中修复 。 From NEWS :
来自新闻 :
Fixing #5007 also fixes #4957, where
.N
was not visible duringlapply(.SD, function(x) ...)
inj
.修复#5007也修复了#4957,其中在
j
lapply(.SD, function(x) ...)
期间看lapply(.SD, function(x) ...)
.N
。 Thanks to juba for noticing it here on SO: Replace values in each column based on conditions according to groups (by rows) data.frame感谢juba在SO上注意到它: 根据条件(根据行(按行))替换每列中的值。
Here is a way with data.table
: 这是使用
data.table
的方法:
dt[, lapply(.SD, function(v) {
len <- length(v)
if((sum(v==0)/len)>0.5) rep(0L,len) else v
}), by="Group", .SDcols=c("r1","r2","r3")]
Which gives : 这使 :
Group r1 r2 r3
1: a 6835 256 0
2: a 5395 0 67
3: a 7746 0 30
4: a 7496 556 50
5: a 5780 255 0
6: a 6060 603 0
7: b 0 0 0
8: b 0 258 0
9: b 0 0 0
10: b 0 0 0
11: b 0 505 0
12: b 0 680 0
13: c 8443 4900 0
14: c 8980 4949 0
15: c 7828 0 0
16: c 6509 3257 0
17: c 6563 0 0
Quick and dirty: 快速又肮脏:
ff<-function(x){
if(is.numeric(x)){
b<-by(x==0,df$Group,mean)
x[df$Group %in% names(b)[b>0.5]]<-0
}
x
}
data.frame(lapply(df,ff))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.