简体   繁体   中英

breaking lapply for a blank file R

I am trying to check percentage completion of common elements (BGC**) in different sample files. My input file format are as follows:

file1.txt
-----------

contig SRR5947942_idxstats.txt
BGC0000972 0
BGC0000972 0
BGC0000972 0
BGC0000972 1
BGC0000972 0
BGC0000972 0

file2.txt
----------
contig SRR5947963_idxstats.txt
BGC0000581 0
BGC0000581 22
BGC0000581 60
BGC0000581 0
BGC0000972 14
BGC0000972 24

I save them in a directory and run my script as:

filenames <- list.files(full.names=F, pattern=".txt")
output <-lapply(filenames,function(i){
  t<-read.csv(i, header=T, check.names = F, sep = " ")
  t$gene_count<-1
  t[,2][t[,2]>0]<-1
  presence_absence_df<-aggregate(. ~ contig, t, sum)
  presence_absence_df$sample_name<-names(t[2])
  colnames(presence_absence_df)<-c("BGC_Accession","Gene_presence", "Gene_count", "Sample_name")
  presence_absence_df$Percentage<-(presence_absence_df$Gene_presence/presence_absence_df$Gene_count)*100
  presence_absence_df<-presence_absence_df[presence_absence_df$Percentage != 0, ]
  presence_absence_df$tp_step2_100_percent<-length(presence_absence_df$Percentage[presence_absence_df$Percentage>=100])
  presence_absence_df<-presence_absence_df[presence_absence_df$Percentage >= 100, ]
  presence_absence_df<-data.frame(presence_absence_df)
  presence_absence_df <- subset(presence_absence_df, select = -c(Gene_presence, Gene_count, Percentage) )
  colnames(presence_absence_df)<-c("BGC_name", "Sample", "BGCs_step2_100_percent")
  presence_absence_df <- presence_absence_df [c("Sample", "BGCs_step2_100_percent", "BGC_name")]
})
Step2_results2_100<-do.call(rbind,output)

这给了我结果

The problem is, if any of the input file has all zero, the code show error. For example, if I change the file1.txt as follows:

file1.txt
-----------

contig SRR5947942_idxstats.txt
BGC0000972 0
BGC0000972 0
BGC0000972 0
BGC0000972 0
BGC0000972 0
BGC0000972 0

Then I get:

Error in `$<-.data.frame`(`*tmp*`, "tp_step2_100_percent", value = 0L) : 
  replacement has 1 row, data has 0

I want to bypass the processing of the all zero files without showing the error. Thank you for your help!

Return NULL if all the values in second column is 0.

output <-lapply(filenames,function(i) {
  t <- read.csv(i, header=T, check.names = F, sep = " ")
  if(all(t[[2]] == 0)) return(NULL)
  t$gene_count<-1
  t[,2][t[,2]>0]<-1
  #Rest of the code
  #Rest of the code
})

Step2_results2_100 <- do.call(rbind,output)

When you do do.call(rbind,output) those NULL values will be ignored.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM