I have a data set which includes the column name blactamases. In the .csv file the "beta" part of the name is imported into R as a misinterpreted symbol (looks like an I with a squared sign next to it).
As I regularly import this file, I have a source file to perform some basic data cleaning and prepare the data set for analysis. I included a line of code to convert the column name to something more user friendly, see below:
colnames(df)[which(names(df) == "î²lactamases")] <- "blactamases"
This runs fine if I just run the line of code by itself. However when I try to run the source file it fails at this line. No error is generated, the only reason I know it has failed is because the column name has not changed and subsequent operations referencing the revised column name don't work.
Even more curiously, the line below this one in the source file uses exactly the same procedure to change another column name and runs fine when sourced:
colnames(df)[which(names(df) == "eae1")] <- "eaeseq"
Any ideas would be much appreciated - is there something I need to add before the î²
to make it run from source properly?
I'm using R Studio 0.99.489 and R version 3.2.3.
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] digest_0.6.8 foreign_0.8-66 xlsx_0.5.7 xlsxjars_0.6.1 rJava_0.9-7 SnowballC_0.5.1
[7] wordcloud_2.5 RColorBrewer_1.1-2 tm_0.6-2 NLP_0.1-8 rsatscan_0.3.9200 surveillance_1.10-0
[13] polyCub_0.5-2 xtable_1.8-0 epitools_0.5-7 ggmap_2.5.2 ggplot2_1.0.1 geosphere_1.4-3
[19] rgdal_1.1-1 sp_1.2-1 MRAtools_0.6.6 zoo_1.7-12 stringi_1.0-1 stringdist_0.9.4
[25] reshape2_1.4.1 dplyr_0.4.3 plyr_1.8.3 data.table_1.9.6 readxl_0.1.0 RPostgreSQL_0.4
[31] DBI_0.3.1 RODBCext_0.2.5 RODBC_1.3-12
loaded via a namespace (and not attached):
[1] slam_0.1-32 lattice_0.20-33 colorspace_1.2-6 mgcv_1.8-10 chron_2.3-47 spatstat_1.43-0
[7] jpeg_0.1-8 stringr_1.0.0 munsell_0.4.2 gtable_0.1.2 RgoogleMaps_1.2.0.7 mapproj_1.2-4
[13] parallel_3.2.3 proto_0.3-10 Rcpp_0.12.2 tensor_1.5 scales_0.3.0 abind_1.4-3
[19] deldir_0.1-9 rjson_0.2.15 png_0.1-7 RJSONIO_1.3-0 polyclip_1.3-2 grid_3.2.3
[25] tools_3.2.3 magrittr_1.5 maps_3.0.1 goftest_1.0-3 MASS_7.3-45 Matrix_1.2-3
[31] assertthat_0.1 R6_2.1.1 nlme_3.1-122
Not sure if this is what you mean by "more user friendly", but an easy way to remove the oddball characters is using iconv(x, to = "ASCII", sub = "")
, which will remove all non-ASCII characters. I often use this as a last resort when difficult characters are complicating text analysis functions. It's effective but a bit destructive, a Samuel L. Jackson way of opening some windows .
df <- data.frame(1:3, letters[1:3], NA, stringsAsFactors = FALSE)
names(df) <- c("î²lactamases", "regularname", "hopele§§")
df
## î²lactamases regularname hopele§§
## 1 1 a NA
## 2 2 b NA
## 3 3 c NA
names(df) <- iconv(names(df), to = "ASCII", sub = "")
df
## lactamases regularname hopele
## 1 1 a NA
## 2 2 b NA
## 3 3 c NA
If you want to make specific substitutions, then I suggest gsub
-ing the names(df)
to replace î²
with b
, §
with s
(in my example), etc.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.