[英]R loop with multiple CSV files and grouping and joining in single file
I'm trying to import and read multiple csv files and and group them and then combine all file into single file I have done single single file the code is given here First I imported csv file seperately我正在尝试导入和读取多个 csv 文件并将它们分组,然后将所有文件合并到单个文件中
`Rhodovulum_adriaticum_NCBI <- read.csv("Rhodovulum_adriaticum.csv")`
Than group is比组是
Rhodovulum_adriaticum<-Rhodovulum_adriaticum_NCBI%>%group_by(Protein.Name)%>% summarize(Rhodovulum_adriaticum=n()) %>% arrange(desc(Rhodovulum_adriaticum))
View(Rhodovulum_adriaticum)
Than listed all processed CSV file比列出所有已处理的 CSV 文件
Rhodo_data_list_NCBI<-list(Rhodovulum_adriaticum, ......,)
Than merge it into single file比将其合并到单个文件中
Rhodo_merge_NCBI<-Rhodo_data_list_NCBI%>%reduce(left_join, by ="Protein.Name")
But I want to do with looping Kindly help但我想做循环请帮助
I tried this so far到目前为止我试过这个
setwd("/Users/mdumar/Desktop/NCBI/NCBI/")
data_names<-list.files()
for(i in 1:length(data_names)) {
read.csv(data_names[i]),
data_names[i]<-data_names[i]%>%group_by(Protein.Name)%>% summarize(data_names[i]=n()) %>% arrange(desc(data_names[i]))
Rhodo_data_list_NCBI<-list(data_names[i]),
Rhodo_merge_NCBI<-Rhodo_data_list_NCBI%>%reduce(left_join, by ="Protein.Name")
}
You can probably do something like this, using tidyverse
你可能可以做这样的事情,使用
tidyverse
.csv
.csv
setwd("/Users/mdumar/Desktop/NCBI/NCBI/")
fnames = str_remove(list.files(pattern=".csv"), ".csv")
lapply()
, each time reading the csv file and then counting each Protein.Name
, using count()
;lapply()
遍历这些 fname ,每次读取 csv 文件,然后使用count()
计算每个Protein.Name
; add a grp
variable that holds the source filename using mutate
.mutate
添加一个保存源文件名的grp
变量。 Place this list as the argument to bind_rows()
, and then pivot_longerbind_rows()
的参数,然后是 pivot_longerbind_rows(lapply(fnames, function(fname) {
read.csv(paste0(fname,".csv")) %>%
count(Protein.Name) %>%
mutate(grp = fname)
})) %>%
pivot_wider(id_cols = Protein.Name,names_from = grp, values_from = n)
Let's say you had two files in your working directory, called "Rhodovulum_adriaticum.csv" and "Rhodovulum_sulfidophilum.csv", and that these looked like this:假设您的工作目录中有两个文件,分别称为“Rhodovulum_adriaticum.csv”和“Rhodovulum_sulfidophilum.csv”,它们看起来像这样:
structure(list(Protein.Name = c("a", "a", "b", "b", "c", "c")), class = "data.frame", row.names = c(NA,
-6L))
and和
structure(list(Protein.Name = c("a", "a", "a", "a", "b", "b")), class = "data.frame", row.names = c(NA,
-6L))
Then, the above code would return:然后,上面的代码将返回:
Protein.Name Rhodovulum_adriaticum Rhodovulum_sulfidophilum
<chr> <int> <int>
1 a 2 4
2 b 2 2
3 c 2 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.