简体   繁体   中英

R remove quotation mark in column name of the data frame

I have SPSS data, which I have to migrate to R. The data is large with 202 columns and thousands of rows

v1 v2    v3     v4 v5
1  USA   Male   21 Married
2  INDIA Female 54 Single
3  CHILE Male   33 Divorced ...and so on...

The data file contains variable labels "Identification No", "Country of origin", "Gender", "(Current) Year", "Marital Status - Candidate"

I read my data from SPSS with following command

data<-read.spss(file.sav,to.data.frame=TRUE,reencode='utf-8')

The column name is read as v1,v2,v3,v4 etc, but I want variable labels as my column name in data frame. I used following command to find the variable labels and set it as names

vname<-attr(data,"variable.labels")
for(i in 1:202){vl[i]<-vname[[i]]}
names(data)<-vl

Now the problem is that I have to address that column like data$"Identification number" , which is not very nice. I want to remove quotation marks around the column names. How can I do that?

You can't. An unquoted space is a syntactic symbol that breaks the grammar up.

An option is to change the names to ones without spaces in, and you can use the make.names function to do that.

> N = c("foo","bar baz","bar baz")
> make.names(N)
[1] "foo"     "bar.baz" "bar.baz"

You might want to make sure you have unique names:

> make.names(N, unique=TRUE)
[1] "foo"       "bar.baz"   "bar.baz.1"

The quotation marks were there because the names had spaces in them. print(vl,quotes=FALSE) displayed text without quotation marks. But I had to use quotation marks in order to use it as a single variable name. Without quotation marks, the spaces would break the variable names.

This could be solved by removing spaces in the name. I solved this by substituting all the spaces in between the names by using gsub command

vl<-gsub(" ","",vl)
names(data)<-vl

Now most of the column names can be accessed without using quotation marks. But the names containing other punctuation marks couldn't be used without quotation.

Alos the solution by Spacedman worked fine and seems easier to use.

make.names(vl, unique=TRUE)

But I liked the solution by David Arenburg.

gsub("[ [:punct:]]", "" , vl)

It removed all punctuation marks and made the column name clean and better.

Spaces are okay in data.table column names without much fuss. But, no, there's no way to avoid using quotation marks for the reason Spacedman gave: spaces break up the syntax.

require(data.table)
DT <- data.table(a = c(1,1), "bc D" = c(2,3))

# three identical results:
DT[['bc D']]
DT$bc
DT[,`bc D`]

Okay, so partial matching with $ (which also works with data.frames) gets you out of using quotes. But it will bring trouble if you get it wrong.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM