简体   繁体   中英

Unable to get the file size of PNG image in R

I am trying to build a CNN model for Image classification in R, but because my train Data is huge (1.7 GB https://www.kaggle.com/c/plant-seedlings-classification/data ), I am trying to read thru all the files and get their file size information in a data frame, so that I can remove the heavy images from the train data set within the code. Below is the snippet of the sample code :

      #Block 1 : creating a data frame of all the subfolder and image file in them 
      df_trainfiles <- data.frame(ID=numeric(),foldername=character(),filename=character(),filesize=numeric(),stringsAsFactors = F)
      df_testfiles<-data.frame(ID=numeric(),foldername=character(),filename=character(),filesize=numeric(),stringsAsFactors = F)

      df_train<-data.frame(info=character(),stringsAsFactors = F)
      df_test<-data.frame(info=character(),stringsAsFactors = F)

      trainDataPath<-"C:/Users/chiragrawal/Desktop/Learning/1. Kaggle/0.2 Plant Seedlings Classification/train/train"
      lsSubfolder<-list.files(path = trainDataPath,pattern = )

      for (intX in 1:length(lsSubfolder)){
        lsfiles<-list.files(path = paste0(trainDataPath,"/",lsSubfolder[intX]))  
          for(intY in 1:length(lsfiles)){
          df_trainfiles[nrow(df_trainfiles)+1,]<-list(nrow(df_trainfiles)+1, lsSubfolder[intX],lsfiles[intY],file.size(paste0(trainDataPath,"/", df_trainfiles[i,2],"/", df_trainfiles[i,3],sep="")))
        }
      }

When I look into the df_trainfiles after running the code, the field for file size shows "N/A" . I have tried few other methods, I found in other forums but none of the solution worked.

Your help is highly appreciated! Thank you :)

My advice would be to not use a for loop, because more robust methods exist to list files and read their features.

Here is a proposition:

trainDataPath <- "C:/Users/chiragrawal/Desktop/Learning/1. Kaggle/0.2 Plant Seedlings Classification/train/train"
f <- list.files(path = trainDataPath, pattern = "png", recursive = TRUE, full.names=TRUE)
filename <- list.files(path = trainDataPath, pattern = "png", recursive = TRUE)
foldername <- sapply(strsplit(filename, "/"), "[", 1)
filesize <- file.size(f)

df_trainfiles <- data.frame(foldername, filename, filesize, stringsAsFactors = F)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM