简体   繁体   中英

R loop with multiple CSV files and grouping and joining in single file

I'm trying to import and read multiple csv files and and group them and then combine all file into single file I have done single single file the code is given here First I imported csv file seperately

`Rhodovulum_adriaticum_NCBI <- read.csv("Rhodovulum_adriaticum.csv")`

Than group is

    Rhodovulum_adriaticum<-Rhodovulum_adriaticum_NCBI%>%group_by(Protein.Name)%>% summarize(Rhodovulum_adriaticum=n()) %>% arrange(desc(Rhodovulum_adriaticum))
View(Rhodovulum_adriaticum)

Than listed all processed CSV file

Rhodo_data_list_NCBI<-list(Rhodovulum_adriaticum, ......,)

Than merge it into single file

Rhodo_merge_NCBI<-Rhodo_data_list_NCBI%>%reduce(left_join, by ="Protein.Name")

But I want to do with looping Kindly help

I tried this so far

setwd("/Users/mdumar/Desktop/NCBI/NCBI/")


data_names<-list.files()

for(i in 1:length(data_names)) {           
  read.csv(data_names[i]), 
  data_names[i]<-data_names[i]%>%group_by(Protein.Name)%>% summarize(data_names[i]=n()) %>% arrange(desc(data_names[i]))
  Rhodo_data_list_NCBI<-list(data_names[i]),
  Rhodo_merge_NCBI<-Rhodo_data_list_NCBI%>%reduce(left_join, by ="Protein.Name")
}

You can probably do something like this, using tidyverse

  1. get vector of file names, without .csv
setwd("/Users/mdumar/Desktop/NCBI/NCBI/")
fnames = str_remove(list.files(pattern=".csv"), ".csv")
  1. Loop over these fname using lapply() , each time reading the csv file and then counting each Protein.Name , using count() ; add a grp variable that holds the source filename using mutate . Place this list as the argument to bind_rows() , and then pivot_longer
bind_rows(lapply(fnames, function(fname) {
  read.csv(paste0(fname,".csv")) %>% 
    count(Protein.Name) %>% 
    mutate(grp = fname)
})) %>% 
  pivot_wider(id_cols = Protein.Name,names_from = grp, values_from = n)

Let's say you had two files in your working directory, called "Rhodovulum_adriaticum.csv" and "Rhodovulum_sulfidophilum.csv", and that these looked like this:

structure(list(Protein.Name = c("a", "a", "b", "b", "c", "c")), class = "data.frame", row.names = c(NA, 
-6L))

and

structure(list(Protein.Name = c("a", "a", "a", "a", "b", "b")), class = "data.frame", row.names = c(NA, 
-6L))

Then, the above code would return:

  Protein.Name Rhodovulum_adriaticum Rhodovulum_sulfidophilum
  <chr>                        <int>                    <int>
1 a                                2                        4
2 b                                2                        2
3 c                                2                       NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM