简体   繁体   中英

Removing accents in column names in R

I have tried almost every solution in this website so I am starting to think that this may be an issue coming from the excel files. Anyways I have multiple xlsx files with sheets that I have merged into one dataframe (using map_df). Unfortuntely, the names are in spanish and it creates issues with R as the code progresses. The accented names are only in the column names, any tips or recommendations as to how to tackle this when it comes to just the accent names? Not sure if it coming from xlsx files is the reason why the codes I have tried don't work. Thank you.

dput data sample as requested:

structure(list(file = c("location1/location2/namelocationfile1.xlsx", 
"location1/location2/namelocationfile2.xlsx", 
"location1/location2/namelocationfile3.xlsx", 
"location1/location2/namelocationfile4.xlsx", 
"location1/location2/namelocationfile5.xlsx", 
"location1/location2/namelocationfile6.xlsx"
), sheet = c("TOTAL-2015 ", "TOTAL-2015 ", "TOTAL-2015 ", "TOTAL-2015 ", 
"TOTAL-2015 ", "TOTAL-2015 "), age = c("Total", "0-4", "0", 
"1", "2", "3"), total = c("355461", "35173", "7091", "7042", 
"7027", "7008"), plán = c("126131", "11698", "2407", "2318", 
"2349", "2282"), pláns = c("8456", "726", "162", "135", "133", 
"138"), place = c("35112", "2969", "599", "607", "555", 
"597"), concepción = c("12912", "1283", "281", "263", 
"244", "253"), refugio = c("10959", "903", "174", "174", "206", 
"184"), lugar = c("20733", "2229", "431", "454", "409", "486"
), san_marco = c("31082", "3271", "624", "658", "670", "656"), 
    menéndez = c("47495", "5070", "990", "1023", 
    "1008", "1020"), san = c("10244", "955", "193", "203", 
    "189", "194"), san_pedro = c("8374", "915", "183", 
    "181", "205", "175"), buenosaires = c("33242", "4244", "862", 
    "857", "894", "836"), turín = c("10721", "910", "185", "169", 
    "165", "187")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

Here is one possible solution with iconv :

string <- c("à", "è", "ì"," ò"," ù", "À", "È", "Ì", "Ò", "Ù")

# [1] "à"  "è"  "ì"  " ò" " ù" "À"  "È"  "Ì"  "Ò"  "Ù" 


gsub("`", "", iconv(string, from = "UTF-8", , to='ASCII//TRANSLIT'))

# [1] "a"  "e"  "i"  " o" " u" "A"  "E"  "I"  "O"  "U" 

Another option with stringi that doesn't require gsub :

library(stringi)
stri_trans_general(str = string, id = "Latin-ASCII")

# [1] "a"  "e"  "i"  " o" " u" "A"  "E"  "I"  "O"  "U" 

Update

To apply function to the column names with rename_with , we need to use .x in iconv . Additionally, for gsub , the pattern is ' instead of "`".

library(tidyverse)

df_new <- df %>% 
    rename_with(., ~ gsub("'", "", iconv(.x, from = "UTF-8", to='ASCII//TRANSLIT')))

# Or we can use `stringr` instead of `gsub`:
# df %>% 
#    rename_with(., ~ str_replace_all(iconv(.x, to='ASCII//TRANSLIT'), "'", ""))

colnames(df_new)
# [1] "file"        "sheet"       "age"         "total"       "plan"        "plans"       "place"       "concepcion"  "refugio"    
# [10] "lugar"       "san_marco"   "menendez"    "san"         "san_pedro"   "buenosaires" "turin"

base R options:

colnames(df) <- gsub("'", "", iconv(colnames(df), from = "UTF-8", to='ASCII//TRANSLIT'))

Or:

colnames(df) <- stri_trans_general(str = colnames(df), id = "Latin-ASCII")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM