简体   繁体   English

如何使用循环形式的r重命名多个数据集中的列子集

[英]how to rename a subset of columns in multiple datasets with r in the form of a loop

I know that this question has been asked before but I can not get it to work for me and I swear I tried many ways do do it from for file in loops to lapply. 我知道之前已经问过这个问题,但是我无法让它为我工作,我发誓我尝试了很多方法来做文件循环到lapply。 I have tables in which I want to replace the columns 2 to 7 'S headers which are now in this format: "X1","X2","X3","X4","X5","X6","X7" into "Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species". 我有表格,我想要替换现在采用这种格式的列2到7'S标题:“X1”,“X2”,“X3”,“X4”,“X5”,“X6”,“X7 “进入”王国“,”门“,”阶级“,”秩序“,”家庭“,”属“,”物种“。

Each table does not have the same amount of row nor column. 每个表都没有相同数量的行或列。

My 31 tables are listed as this: 我的31个表格如下:

step4 <- list.files(pattern="*.coldrop.tsv")

Also, and this is a ''sub-problem'', I am doing it from the 2nd column because RAM keeps adding row numbers (1,2,3,4,5,6....n). 此外,这是一个''子问题'',我从第二列开始,因为RAM不断添加行号(1,2,3,4,5,6 .... n)。 If anyone can help me here that would be great.. I need to do it on all these ''step4'' list of tables. 如果有人能在这里帮助我,那就太棒了......我需要在所有这些''step4''表格中做到这一点。 here are some ''samples'' of what I want to do. 这里有一些我想做的“样本”。

when I fisrt was trying I opted for the for file in loop option: 当我fisrt尝试时,我选择了for file in loop选项:

colnames <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")

The following works on a single file 以下适用于单个文件

names(Omlo_run11_table.tsv.step1.tsv.step2.tsv.step3.tsv.coldrop.tsv)[2:8] <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")

i = 1
for(i in 1:length(step4)){
  names(step4[i])[2:8] <- c("Kingdom","Phylum","Class","Order","Family","Genus","Species") 

}

I get this: Error in names(step4[i])[2:8] <- c("Kingdom", "Phylum", "Class", "Order", : 'names' attribute [8] must be the same length as the vector [1] 我明白了:姓名错误(step4 [i])[2:8] < - c(“王国”,“门”,“班级”,“订单”,“名称”属性[8]必须相同长度为向量[1]

names(get(step4[i]))[names(get(step4[i])) == "X1","X2","X3","X4","X5","X6","X7"] <- c("Kingdom","Phylum","Class","Order","Family","Genus","Species")

I get this: Error in names(get(step4[i]))[names(get(step4[i])) == "X1", "X2", "X3", : incorrect number of subscripts 我明白了:名字错误(get(step4 [i]))[names(get(step4 [i]))==“X1”,“X2”,“X3”,:下标数不正确

for(i in 1:length(step4)){
  nm <- paste0("step4[i]")
  tmp <- get(nm)
  colnames(tmp)[2:8] <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")
  assign(nm, tmp)
}

I get this: Error in get(nm) : object 'step4[i]' not found 我明白了:get(nm)中的错误:找不到对象'step4 [i]'

lapply (step4, function(df) { colnames(df)[2:length(step4)] <-colnames[1:length(step4)]-1)}

and so on... I am more of a for file in type of person but I am open to lapply options. ......等等......我更喜欢那种类型的文件,但我对lapply期权持开放态度。 I encountered solutions with setnames but could not figure it out either.. Can please someone help me... 我遇到了带有设置名称的解决方案,但也无法弄清楚..可以请有人帮助我......

Simply create a list of dataframes using your step4 character vector as @Gregor comments. 只需使用step4字符向量创建一个数据帧列表作为@Gregor注释。 Then, rename columns of each df iteratively which can all be handled in one lapply() anonymous function. 然后,迭代地重命名每个df的列,这些列都可以在一个lapply()匿名函数中处理。 Also, since you are working with tab separated files, you want the generalized read.table() function (of which read.csv is a special wrapper for comma separated files): 此外,由于您正在使用制表符分隔文件,您需要通用的read.table()函数(其中read.csv是逗号分隔文件的特殊包装器):

step4 <- list.files(path = tsvfilepath, pattern=".*tsv$", full.names = TRUE)

dfList <- lapply(step4, function(i) {
        df <- read.table(i, sep="\t", quote="", header=TRUE, as.is=FALSE)
        names(df)[2:8] <- c("Kingdom","Phylum","Class","Order","Family","Genus","Species") 
        return(df)
})

使用Colnames导入TSV文件


This list becomes useful for various needs such as for individual dataframes or one master dataframe. 此列表可用于各种需求,例如单个数据帧或一个主数据帧。

For individual dfs, consider setNames() to name each individually and list2env() to create separate environment objects. 对于单个dfs,请考虑使用setNames()单独命名每个dfs,并使用list2env()来创建单独的环境对象。 Below gives each df the same name as their corresponding file name: 下面给出了每个df与其对应文件名相同的名称:

dfList <- setNames(dfList, step4)

list2env(dfList, envir=.GlobalEnv)

For one large master df, where you append all dataframes together, you have the challenge of the incomplete columns. 对于一个大型主df,您将所有数据帧附加在一起,您就会遇到不完整列的挑战。 Hence, consider third-party packages to fill in for missing columns across dfs: 因此,考虑使用第三方软件包来填充dfs中缺少的列:

library(plyr)
rbind.fill(dfList)

library(dplyr)
bind_rows(dfList)

library(data.table)    
rbindlist(dfList, fill=TRUE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM