简体   繁体   English

R数据框列名称重命名

[英]R Data Frames column names rename

I am new to R and not sure why I have to rename data frame column names at the end of the program though I have defined data frame with column names at the beginning of the program. 我是R的新手,虽然我在程序的开头定义了带有列名的数据框,但是不确定为什么我必须在程序末尾重命名数据框的列名。 The use of the data frame is, I got two columns where I have to save sequence under ID column and some sort of number in NOBS column. 数据帧的用途是,我有两列,我必须在ID列下保存序列,并在NOBS列中保存某种数字。

complete <- function(directory, id = 1:332) {

  collectCounts = data.frame(id=numeric(), nobs=numeric()) 

  for(i in id)  {
    fileName = sprintf("%03d",i)
    fileLocation = paste(directory, "/", fileName,".csv", sep="")

    fileData = read.csv(fileLocation, header=TRUE)
    completeCount = sum(!is.na(fileData[,2]), na.rm=TRUE)

    collectCounts <- rbind(collectCounts, c(id=i, completeCount))
    #print(completeCount)

  }

  colnames(collectCounts)[1] <- "id"
  colnames(collectCounts)[2] <- "nobs"  
  print(collectCounts)  

}

Its not quite clear what your specific problem is, as you did not provide a complete and verifiable example . 由于您没有提供完整且可验证的示例 ,因此您不清楚具体的问题是什么。 But I can give a few pointers on improving the code, nonetheless. 但是,尽管如此,我还是可以提供一些改进代码的指导。

1) It is not recommended to 'grow' a data.frame within a loop. 1)不建议在循环中“增长” data.frame。 This is extremely inefficient in R, as it copies the entire structure each time. 这在R中效率极低,因为它每次都会复制整个结构。 Better is to assign the whole data.frame at the outset, then fill in the rows in the loop. 更好的方法是在一开始就分配整个data.frame,然后在循环中填充行。

2) R has a handy function paste0 that does not require you to specify sep = "" . 2)R有一个方便的函数paste0 ,不需要您指定sep = ""

3) There's no need to specify na.rm = TRUE in your sum , because is.na will never return NA's 3)无需在sum指定na.rm = TRUE ,因为is.na永远不会返回NA的值

Putting this together: 放在一起:

complete = function(directory, id = 1:332) {
  collectCounts = data.frame(id=id, nobs=numeric(length(id))) 
  for(i in 1:length(id))  {
    fileName = sprintf("%03d", id[i])
    fileLocation = paste0(directory, "/", fileName,".csv")
    fileData = read.csv(fileLocation, header=TRUE)
    completeCount = sum(!is.na(fileData[, 2]))
    collectCounts[i, 'nobs'] <- completeCount
  }
}

Always hard to answer questions without example data. 没有示例数据,总是很难回答问题。

You could start with 你可以开始

collectCounts = data.frame(id, nobs=NA)

And in your loop, do: 在您的循环中,执行以下操作:

collectCounts[i, 2] <- completeCount

Here is another way to do this: 这是执行此操作的另一种方法:

complete <- function(directory, id = 1:332) {
    nobs <- sapply(id, function(i) {
            fileName = paste0(sprintf("%03d",i), ".csv")
            fileLocation = file.path(directory, fileName)
            fileData = read.csv(fileLocation, header=TRUE)
            sum(!is.na(fileData[,2]), na.rm=TRUE)
        }
    )
    data.frame(id=id, nobs=nobs)
}  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM