简体   繁体   English

R 循环与多个 CSV 文件和分组和加入单个文件

[英]R loop with multiple CSV files and grouping and joining in single file

I'm trying to import and read multiple csv files and and group them and then combine all file into single file I have done single single file the code is given here First I imported csv file seperately我正在尝试导入和读取多个 csv 文件并将它们分组,然后将所有文件合并到单个文件中

`Rhodovulum_adriaticum_NCBI <- read.csv("Rhodovulum_adriaticum.csv")`

Than group is比组是

    Rhodovulum_adriaticum<-Rhodovulum_adriaticum_NCBI%>%group_by(Protein.Name)%>% summarize(Rhodovulum_adriaticum=n()) %>% arrange(desc(Rhodovulum_adriaticum))
View(Rhodovulum_adriaticum)

Than listed all processed CSV file比列出所有已处理的 CSV 文件

Rhodo_data_list_NCBI<-list(Rhodovulum_adriaticum, ......,)

Than merge it into single file比将其合并到单个文件中

Rhodo_merge_NCBI<-Rhodo_data_list_NCBI%>%reduce(left_join, by ="Protein.Name")

But I want to do with looping Kindly help但我想做循环请帮助

I tried this so far到目前为止我试过这个

setwd("/Users/mdumar/Desktop/NCBI/NCBI/")


data_names<-list.files()

for(i in 1:length(data_names)) {           
  read.csv(data_names[i]), 
  data_names[i]<-data_names[i]%>%group_by(Protein.Name)%>% summarize(data_names[i]=n()) %>% arrange(desc(data_names[i]))
  Rhodo_data_list_NCBI<-list(data_names[i]),
  Rhodo_merge_NCBI<-Rhodo_data_list_NCBI%>%reduce(left_join, by ="Protein.Name")
}

You can probably do something like this, using tidyverse你可能可以做这样的事情,使用tidyverse

  1. get vector of file names, without .csv获取文件名向量,不带.csv
setwd("/Users/mdumar/Desktop/NCBI/NCBI/")
fnames = str_remove(list.files(pattern=".csv"), ".csv")
  1. Loop over these fname using lapply() , each time reading the csv file and then counting each Protein.Name , using count() ;使用lapply()遍历这些 fname ,每次读取 csv 文件,然后使用count()计算每个Protein.Name add a grp variable that holds the source filename using mutate .使用mutate添加一个保存源文件名的grp变量。 Place this list as the argument to bind_rows() , and then pivot_longer将此列表作为bind_rows()的参数,然后是 pivot_longer
bind_rows(lapply(fnames, function(fname) {
  read.csv(paste0(fname,".csv")) %>% 
    count(Protein.Name) %>% 
    mutate(grp = fname)
})) %>% 
  pivot_wider(id_cols = Protein.Name,names_from = grp, values_from = n)

Let's say you had two files in your working directory, called "Rhodovulum_adriaticum.csv" and "Rhodovulum_sulfidophilum.csv", and that these looked like this:假设您的工作目录中有两个文件,分别称为“Rhodovulum_adriaticum.csv”和“Rhodovulum_sulfidophilum.csv”,它们看起来像这样:

structure(list(Protein.Name = c("a", "a", "b", "b", "c", "c")), class = "data.frame", row.names = c(NA, 
-6L))

and

structure(list(Protein.Name = c("a", "a", "a", "a", "b", "b")), class = "data.frame", row.names = c(NA, 
-6L))

Then, the above code would return:然后,上面的代码将返回:

  Protein.Name Rhodovulum_adriaticum Rhodovulum_sulfidophilum
  <chr>                        <int>                    <int>
1 a                                2                        4
2 b                                2                        2
3 c                                2                       NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM