R 循环与多个 CSV 文件和分组和加入单个文件

Question

I'm trying to import and read multiple csv files and and group them and then combine all file into single file I have done single single file the code is given here First I imported csv file seperately我正在尝试导入和读取多个 csv 文件并将它们分组，然后将所有文件合并到单个文件中

`Rhodovulum_adriaticum_NCBI <- read.csv("Rhodovulum_adriaticum.csv")`

Than group is比组是

    Rhodovulum_adriaticum<-Rhodovulum_adriaticum_NCBI%>%group_by(Protein.Name)%>% summarize(Rhodovulum_adriaticum=n()) %>% arrange(desc(Rhodovulum_adriaticum))
View(Rhodovulum_adriaticum)

Than listed all processed CSV file比列出所有已处理的 CSV 文件

Rhodo_data_list_NCBI<-list(Rhodovulum_adriaticum, ......,)

Than merge it into single file比将其合并到单个文件中

Rhodo_merge_NCBI<-Rhodo_data_list_NCBI%>%reduce(left_join, by ="Protein.Name")

But I want to do with looping Kindly help但我想做循环请帮助

I tried this so far到目前为止我试过这个

setwd("/Users/mdumar/Desktop/NCBI/NCBI/")


data_names<-list.files()

for(i in 1:length(data_names)) {           
  read.csv(data_names[i]), 
  data_names[i]<-data_names[i]%>%group_by(Protein.Name)%>% summarize(data_names[i]=n()) %>% arrange(desc(data_names[i]))
  Rhodo_data_list_NCBI<-list(data_names[i]),
  Rhodo_merge_NCBI<-Rhodo_data_list_NCBI%>%reduce(left_join, by ="Protein.Name")
}

Answer 1

You can probably do something like this, using tidyverse你可能可以做这样的事情，使用tidyverse

get vector of file names, without .csv获取文件名向量，不带.csv

setwd("/Users/mdumar/Desktop/NCBI/NCBI/")
fnames = str_remove(list.files(pattern=".csv"), ".csv")

Loop over these fname using lapply() , each time reading the csv file and then counting each Protein.Name , using count() ;使用lapply()遍历这些 fname ，每次读取 csv 文件，然后使用count()计算每个Protein.Name ； add a grp variable that holds the source filename using mutate .使用mutate添加一个保存源文件名的grp变量。 Place this list as the argument to bind_rows() , and then pivot_longer将此列表作为bind_rows()的参数，然后是 pivot_longer

bind_rows(lapply(fnames, function(fname) {
  read.csv(paste0(fname,".csv")) %>% 
    count(Protein.Name) %>% 
    mutate(grp = fname)
})) %>% 
  pivot_wider(id_cols = Protein.Name,names_from = grp, values_from = n)

Let's say you had two files in your working directory, called "Rhodovulum_adriaticum.csv" and "Rhodovulum_sulfidophilum.csv", and that these looked like this:假设您的工作目录中有两个文件，分别称为“Rhodovulum_adriaticum.csv”和“Rhodovulum_sulfidophilum.csv”，它们看起来像这样：

structure(list(Protein.Name = c("a", "a", "b", "b", "c", "c")), class = "data.frame", row.names = c(NA, 
-6L))

and和

structure(list(Protein.Name = c("a", "a", "a", "a", "b", "b")), class = "data.frame", row.names = c(NA, 
-6L))

Then, the above code would return:然后，上面的代码将返回：

  Protein.Name Rhodovulum_adriaticum Rhodovulum_sulfidophilum
  <chr>                        <int>                    <int>
1 a                                2                        4
2 b                                2                        2
3 c                                2                       NA

R 循环与多个 CSV 文件和分组和加入单个文件

问题描述

1 个解决方案

解决方案1
1 2022-04-19 13:08:59

R 循环与多个 CSV 文件和分组和加入单个文件

问题描述

1 个解决方案

解决方案1 1 2022-04-19 13:08:59

解决方案1
1 2022-04-19 13:08:59