R code to change a column name containing a symbol not executed when sourced

Question

I have a data set which includes the column name blactamases. In the .csv file the "beta" part of the name is imported into R as a misinterpreted symbol (looks like an I with a squared sign next to it).

As I regularly import this file, I have a source file to perform some basic data cleaning and prepare the data set for analysis. I included a line of code to convert the column name to something more user friendly, see below:

colnames(df)[which(names(df) == "î²lactamases")] <- "blactamases"

This runs fine if I just run the line of code by itself. However when I try to run the source file it fails at this line. No error is generated, the only reason I know it has failed is because the column name has not changed and subsequent operations referencing the revised column name don't work.

Even more curiously, the line below this one in the source file uses exactly the same procedure to change another column name and runs fine when sourced:

colnames(df)[which(names(df) == "eae1")] <- "eaeseq"

Any ideas would be much appreciated - is there something I need to add before the î² to make it run from source properly?

I'm using R Studio 0.99.489 and R version 3.2.3.

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] digest_0.6.8        foreign_0.8-66      xlsx_0.5.7          xlsxjars_0.6.1      rJava_0.9-7         SnowballC_0.5.1    
 [7] wordcloud_2.5       RColorBrewer_1.1-2  tm_0.6-2            NLP_0.1-8           rsatscan_0.3.9200   surveillance_1.10-0
[13] polyCub_0.5-2       xtable_1.8-0        epitools_0.5-7      ggmap_2.5.2         ggplot2_1.0.1       geosphere_1.4-3    
[19] rgdal_1.1-1         sp_1.2-1            MRAtools_0.6.6      zoo_1.7-12          stringi_1.0-1       stringdist_0.9.4   
[25] reshape2_1.4.1      dplyr_0.4.3         plyr_1.8.3          data.table_1.9.6    readxl_0.1.0        RPostgreSQL_0.4    
[31] DBI_0.3.1           RODBCext_0.2.5      RODBC_1.3-12       

loaded via a namespace (and not attached):
 [1] slam_0.1-32         lattice_0.20-33     colorspace_1.2-6    mgcv_1.8-10         chron_2.3-47        spatstat_1.43-0    
 [7] jpeg_0.1-8          stringr_1.0.0       munsell_0.4.2       gtable_0.1.2        RgoogleMaps_1.2.0.7 mapproj_1.2-4      
[13] parallel_3.2.3      proto_0.3-10        Rcpp_0.12.2         tensor_1.5          scales_0.3.0        abind_1.4-3        
[19] deldir_0.1-9        rjson_0.2.15        png_0.1-7           RJSONIO_1.3-0       polyclip_1.3-2      grid_3.2.3         
[25] tools_3.2.3         magrittr_1.5        maps_3.0.1          goftest_1.0-3       MASS_7.3-45         Matrix_1.2-3       
[31] assertthat_0.1      R6_2.1.1            nlme_3.1-122

Answer 1

Not sure if this is what you mean by "more user friendly", but an easy way to remove the oddball characters is using iconv(x, to = "ASCII", sub = "") , which will remove all non-ASCII characters. I often use this as a last resort when difficult characters are complicating text analysis functions. It's effective but a bit destructive, a Samuel L. Jackson way of opening some windows .

df <- data.frame(1:3, letters[1:3], NA, stringsAsFactors = FALSE)
names(df) <- c("î²lactamases", "regularname", "hopele§§")
df
##   î²lactamases regularname hopele§§
## 1            1           a       NA
## 2            2           b       NA
## 3            3           c       NA
names(df) <- iconv(names(df), to = "ASCII", sub = "")
df
##   lactamases regularname hopele
## 1          1           a     NA
## 2          2           b     NA
## 3          3           c     NA

If you want to make specific substitutions, then I suggest gsub -ing the names(df) to replace î² with b , § with s (in my example), etc.

R code to change a column name containing a symbol not executed when sourced

Question

1 answers

solution1
0 2016-01-08 18:45:06

R code to change a column name containing a symbol not executed when sourced

Question

1 answers

solution1 0 2016-01-08 18:45:06

solution1
0 2016-01-08 18:45:06