I am trying to build a CNN model for Image classification in R, but because my train Data is huge (1.7 GB https://www.kaggle.com/c/plant-seedlings-classification/data ), I am trying to read thru all the files and get their file size information in a data frame, so that I can remove the heavy images from the train data set within the code. Below is the snippet of the sample code :
#Block 1 : creating a data frame of all the subfolder and image file in them
df_trainfiles <- data.frame(ID=numeric(),foldername=character(),filename=character(),filesize=numeric(),stringsAsFactors = F)
df_testfiles<-data.frame(ID=numeric(),foldername=character(),filename=character(),filesize=numeric(),stringsAsFactors = F)
df_train<-data.frame(info=character(),stringsAsFactors = F)
df_test<-data.frame(info=character(),stringsAsFactors = F)
trainDataPath<-"C:/Users/chiragrawal/Desktop/Learning/1. Kaggle/0.2 Plant Seedlings Classification/train/train"
lsSubfolder<-list.files(path = trainDataPath,pattern = )
for (intX in 1:length(lsSubfolder)){
lsfiles<-list.files(path = paste0(trainDataPath,"/",lsSubfolder[intX]))
for(intY in 1:length(lsfiles)){
df_trainfiles[nrow(df_trainfiles)+1,]<-list(nrow(df_trainfiles)+1, lsSubfolder[intX],lsfiles[intY],file.size(paste0(trainDataPath,"/", df_trainfiles[i,2],"/", df_trainfiles[i,3],sep="")))
}
}
When I look into the df_trainfiles after running the code, the field for file size shows "N/A" . I have tried few other methods, I found in other forums but none of the solution worked.
Your help is highly appreciated! Thank you :)
My advice would be to not use a for
loop, because more robust methods exist to list files and read their features.
Here is a proposition:
trainDataPath <- "C:/Users/chiragrawal/Desktop/Learning/1. Kaggle/0.2 Plant Seedlings Classification/train/train"
f <- list.files(path = trainDataPath, pattern = "png", recursive = TRUE, full.names=TRUE)
filename <- list.files(path = trainDataPath, pattern = "png", recursive = TRUE)
foldername <- sapply(strsplit(filename, "/"), "[", 1)
filesize <- file.size(f)
df_trainfiles <- data.frame(foldername, filename, filesize, stringsAsFactors = F)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.