简体   繁体   中英

Import CSV file with spaces in header using read_csv from readr

I am starting to use readr to import CSV files with read_csv ...how do I deal with CSV files containing spaces in the header names?

read_csv imports them with the spaces (and special characters) which prevents me from going straight to mutate and other dplyr functions.

How do I handle this?

Thanks!

You could use make.names after you read in the data.

df <- data.frame(x=NA)
colnames(df) <- c("This col name has spaces")
colnames(df) <- make.names(colnames(df), unique=TRUE)

It will return column names with periods rather than spaces as separators.

colnames(df)
[1] "This.col.name.has.spaces"

According to the help page make.names takes a character vector and returns a:

A syntactically valid name consisting of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number

EDIT: Including an example with special characters.

df <- data.frame(x=NA)
colnames(df) <- c("Higher than 80(°F)")
colnames(df) <- make.names(colnames(df), unique=TRUE)

colnames(df)
[1] "Higher.than.80..F."

As you can see make.names takes 'illegal' characters and replaces them with periods, to prevent any syntax errors/issues when calling an object name directly.

If you want to remove repeating . 's then add-

colnames(df) <- gsub('(\\.)\\1+', '\\1', colnames(df))
colnames(df)
[1] "Higher.than.80.F."

When I import a csv containing spaces in the headers I can actually access them as usual with the dollar operator. Lets say I have a data.frame (df) like this:

   a a b b
 1   1   1
 2   1   2

Where "aa" ist the name of the first column and "bb" the name of the second, I can get the first column with

df$`a a`

But if you want to change them anyways you can just rename them like this:

names(df) <- c("a_a", "b_b")

The vector you're assigning just needs to have the same length as the columns of the data.frame. A slightly more elegant way would be the use of the stringr package. If you want to replace all spaces with underscores just type this:

library(stringr)    
names(df) <- str_replace_all(names(df), " ", "_")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM