[英]Adding column names to dataframe while reading a csv in r
I have multiple .csv files in my directory which don't have a column name. 我的目录中有多个.csv文件,它们没有列名。 So while reading them without header gives error
所以在没有标题的情况下读取它们会出错
Error in match.names(clabs, names(xi)) : names do not match previous names.
match.names(clabs,names(xi))中的错误:名称与以前的名称不匹配。
So for that reason, I want to append column names to those csv files and combine them all to one single dataframe, but I'm not able to add a column name to those multiple csv file while reading them. 因此,出于这个原因,我想将列名称附加到这些csv文件中,并将它们全部组合到一个数据帧中,但是在读取它们时,我无法将列名称添加到这些csv文件中。 File names are like
test_abc.csv
, test_pqr.csv
, test_xyz.csv
etc. here is what I tried 文件名就像
test_abc.csv
, test_pqr.csv
, test_xyz.csv
等。这是我尝试过的
temp = list.files(pattern="*.csv")
read_csv_filename <- function(filename){
ret <- read.csv(filename,header = F)
ret$city <- gsub(".*[_]([^.]+)[.].*", "\\1", filename)
ret
}
df_all <- do.call(rbind,lapply(temp,read_csv_filename))
How do I add header here to every file while reading? 如何在阅读时在此处向每个文件添加标题?
This is a names that I want to add while reading 这是我在阅读时要添加的名称
colnames = c("Age","Gender","height","weight")
Any suggestion? 有什么建议吗?
Using tidyverse
packages, you can do this nicely with purrr::map_dfr
function, which iterates of a list, performing some function on each elements that returns a dataframe each time, and the row-binds all those data frames together. 使用
tidyverse
包,您可以使用purrr::map_dfr
函数很好地完成此purrr::map_dfr
,该函数迭代一个列表,对每次返回一个数据帧的每个元素执行一些功能,并将这些数据帧行绑定在一起。
library(readr)
library(purrr)
library(dplyr) # only used in example set up
# Setting up some example csv files to work with
mtcars_slim <- select(mtcars, 1:3)
write_csv(slice(mtcars_slim, 1:4), "mtcars_1.csv", col_names = FALSE)
write_csv(slice(mtcars_slim, 5:10), "mtcars_2.csv", col_names = FALSE)
write_csv(slice(mtcars_slim, 11:1), "mtcars_3.csv", col_names = FALSE)
# get file paths, read them all, and row-bind them all
dir(pattern = "mtcars_\\d+\\.csv") %>%
map_dfr(read_csv, col_names = c("mpg", "cyl", "disp"))
#> Parsed with column specification:
#> cols(
#> mpg = col_double(),
#> cyl = col_integer(),
#> disp = col_integer()
#> )
#> # A tibble: 21 x 3
#> mpg cyl disp
#> <dbl> <int> <dbl>
#> 1 21.0 6 160.0
#> 2 21.0 6 160.0
#> 3 22.8 4 108.0
#> 4 21.4 6 258.0
#> 5 18.7 8 360.0
#> 6 18.1 6 225.0
#> 7 14.3 8 360.0
#> 8 24.4 4 146.7
#> 9 22.8 4 140.8
#> 10 19.2 6 167.6
#> # ... with 11 more rows
You can put colnames inside the loop itself like this 您可以像这样在循环本身中放入colnames
temp = list.files(pattern="*.csv")
read_csv_filename <- function(filename){
ret <- read.csv(filename,header = F)
ret$city <- gsub(".*[_]([^.]+)[.].*", "\\1", filename)
colnames(ret) <- c("Age","Gender","height","weight","city")
ret
}
df_all <- do.call(rbind,lapply(temp,read_csv_filename))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.