I'm trying to import and read multiple csv files and and group them and then combine all file into single file I have done single single file the code is given here First I imported csv file seperately
`Rhodovulum_adriaticum_NCBI <- read.csv("Rhodovulum_adriaticum.csv")`
Than group is
Rhodovulum_adriaticum<-Rhodovulum_adriaticum_NCBI%>%group_by(Protein.Name)%>% summarize(Rhodovulum_adriaticum=n()) %>% arrange(desc(Rhodovulum_adriaticum))
View(Rhodovulum_adriaticum)
Than listed all processed CSV file
Rhodo_data_list_NCBI<-list(Rhodovulum_adriaticum, ......,)
Than merge it into single file
Rhodo_merge_NCBI<-Rhodo_data_list_NCBI%>%reduce(left_join, by ="Protein.Name")
But I want to do with looping Kindly help
I tried this so far
setwd("/Users/mdumar/Desktop/NCBI/NCBI/")
data_names<-list.files()
for(i in 1:length(data_names)) {
read.csv(data_names[i]),
data_names[i]<-data_names[i]%>%group_by(Protein.Name)%>% summarize(data_names[i]=n()) %>% arrange(desc(data_names[i]))
Rhodo_data_list_NCBI<-list(data_names[i]),
Rhodo_merge_NCBI<-Rhodo_data_list_NCBI%>%reduce(left_join, by ="Protein.Name")
}
You can probably do something like this, using tidyverse
.csv
setwd("/Users/mdumar/Desktop/NCBI/NCBI/")
fnames = str_remove(list.files(pattern=".csv"), ".csv")
lapply()
, each time reading the csv file and then counting each Protein.Name
, using count()
; add a grp
variable that holds the source filename using mutate
. Place this list as the argument to bind_rows()
, and then pivot_longerbind_rows(lapply(fnames, function(fname) {
read.csv(paste0(fname,".csv")) %>%
count(Protein.Name) %>%
mutate(grp = fname)
})) %>%
pivot_wider(id_cols = Protein.Name,names_from = grp, values_from = n)
Let's say you had two files in your working directory, called "Rhodovulum_adriaticum.csv" and "Rhodovulum_sulfidophilum.csv", and that these looked like this:
structure(list(Protein.Name = c("a", "a", "b", "b", "c", "c")), class = "data.frame", row.names = c(NA,
-6L))
and
structure(list(Protein.Name = c("a", "a", "a", "a", "b", "b")), class = "data.frame", row.names = c(NA,
-6L))
Then, the above code would return:
Protein.Name Rhodovulum_adriaticum Rhodovulum_sulfidophilum
<chr> <int> <int>
1 a 2 4
2 b 2 2
3 c 2 NA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.