简体   繁体   English


[英]R - if-else applied to a list

I am new to R, so it may be that some of concepts are not fully correct... I have a set of files that I read into a list (here just shown the first 3 lines of each): 我是R的新手,所以有些概念可能不完全正确。我有一组文件,我将它们读入列表中(此处仅显示了每个文件的前3行):

       V1  V2         V3
1   10001  33 -0.0499469
2   30001  65  0.0991478
3   50001  54  0.1564400

       V1  V2        V3
1   10001  62 0.0855260
2   30001  74 0.1536640
3   50001  71 0.1020960

       V1  V2          V3
1   10001  49 -0.04661360
2   30001  65  0.16961500
3   50001  61  0.07089600

I want to apply an ifelse condition in order to substitute values in columns and then return exactly the same list. 我想应用ifelse条件以便替换列中的值,然后返回完全相同的列表。 However, when I do this: 但是,当我这样做时:

myfiles<-lapply(myfiles,function(x) ifelse(x$V2>50, x$V3, NA))
 [1]         NA  0.0991478  0.1564400

 [1] 0.0855260 0.1536640 0.1020960

 [1]          NA  0.16961500  0.07089600

it does in fact what I want to, but returns only the columns where the function was applied, and I want it to return the same list as before, with 3 columns (but with the substitutions). 实际上,它确实实现了我想要的功能,但是只返回应用了该函数的列,并且我希望它返回与以前相同的列表,带有3列(但带有替换)。

I guess there should be an easy way to do this with some variant of "apply", but I was not able to find it or solve it. 我想应该有一种简单的方法可以使用“应用”的某种变体来做到这一点,但是我找不到或解决它。

Thanks 谢谢

Perhaps this helps 也许这有帮助

 lapply(myfiles,within, V3 <- ifelse(V2 >50, V3, NA))

 #    V1 V2        V3
 #1 10001 33        NA
 #2 30001 65 0.0991478
 #3 50001 54 0.1564400

 #    V1 V2       V3
 #1 10001 62 0.085526
 #2 30001 74 0.153664
 #3 50001 71 0.102096

#     V1 V2       V3
#1 10001 49       NA
#2 30001 65 0.169615
#3 50001 61 0.070896

Update 更新资料

Another option would be to read the files using fread from data.table which would be fast 另一种办法是阅读使用文件freaddata.table这将是快

files <- list.files(pattern='tab')
lapply(files, function(x) fread(x)[V2<=50,V3:=NA] )
#     V1 V2        V3
#1: 10001 33        NA
#2: 30001 65 0.0991478
#3: 50001 54 0.1564400

#     V1 V2       V3
#1: 10001 62 0.085526
#2: 30001 74 0.153664
#3: 50001 71 0.102096

#     V1 V2       V3
#1: 10001 49       NA
#2: 30001 65 0.169615
#3: 50001 61 0.070896

Or as @Richie Cotton mentioned, you could also bind the datasets together using rbindlist and then do the operation in one step. 或者就像@Richie Cotton提到的那样,您也可以使用rbindlist将数据集绑定在一起,然后一步一步进行操作。

 dt1 <- rbindlist(lapply(files, function(x) 
      fread(x)[,id:= basename(file_path_sans_ext(x))] ))[V2<=50, V3:=NA]

 #     V1 V2        V3   id
 #1: 10001 33        NA tab1
 #2: 30001 65 0.0991478 tab1
 #3: 50001 54 0.1564400 tab1
 #4: 10001 62 0.0855260 tab2
 #5: 30001 74 0.1536640 tab2
 #6: 50001 71 0.1020960 tab2
 #7: 10001 49        NA tab3
 #8: 30001 65 0.1696150 tab3
 #9: 50001 61 0.0708960 tab3

You can use lapply and transform / within . 您可以使用lapplytransform / within There are three possibilities: 有三种可能性:

  • a) ifelse a) ifelse

     lapply(myfiles, transform, V3 = ifelse(V2 > 50, V3, NA)) 
  • b) mathematical operators (potentially more efficient) b)数学运算符(可能更有效)

     lapply(myfiles, transform, V3 = NA ^ (V2 <= 50) * V3) 
  • c) is.na<- c) is.na<-

     lapply(myfiles, within, is.na(V3) <- V2 < 50) 

The result 结果

     V1 V2        V3
1 10001 33        NA
2 30001 65 0.0991478
3 50001 54 0.1564400

     V1 V2       V3
1 10001 62 0.085526
2 30001 74 0.153664
3 50001 71 0.102096

     V1 V2       V3
1 10001 49       NA
2 30001 65 0.169615
3 50001 61 0.070896

This seems harder than it should be because you are working with a list of data frames rather than a single data frame. 这似乎比应该做的要难,因为您正在处理的是数据帧列表,而不是单个数据帧。 You can combine all the data frames into a single one using rbind_all in dplyr . 您可以使用rbind_all中的dplyr将所有数据帧组合为一个帧。

# Some variable renaming for clarity:
# myfiles now refers to the file names; mydata now contains the data
myfiles <- list.files(pattern="tab", full.names=TRUE) 
mydata <- lapply(myfiles, read.table, skip="#")

# Get the number of rows in each data frame
n_rows <- vapply(mydata, nrow, integer(1))
# Combine the list of data frames into a single data frame
all_mydata <- rbind_all(mydata)
# Add an identifier to see which data frame the row came from.
all_mydata$file <- rep(myfiles, each = n_rows)

# Now update column 3
is.na(all_mydata$V3) <- all_mydata$V2 < 50

Try adding an id column for each df and binding them together: 尝试为每个df添加一个id列并将其绑定在一起:

for(i in 1:3) myfiles[[i]]$id = i
ddf = myfiles[[1]]
for(i in 2:3) ddf = rbind(ddf, myfiles[[i]])

Then apply changes on composite df and split it back again: 然后在复合df上应用更改,然后再次将其拆分回:

ddf$V3 = ifelse(ddf$V2>50, ddf$V3, NA)
myfiles = lapply(split(ddf, ddf$id), function(x) x[1:3])

     V1 V2        V3
1 10001 33        NA
2 30001 65 0.0991478
3 50001 54 0.1564400

      V1 V2       V3
11 10001 62 0.085526
21 30001 74 0.153664
31 50001 71 0.102096

      V1 V2       V3
12 10001 49       NA
22 30001 65 0.169615
32 50001 61 0.070896

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM