[英]Removing accents in column names in R
我几乎尝试了该网站中的所有解决方案,因此我开始认为这可能是 excel 文件的问题。 无论如何,我有多个 xlsx 文件,其中的工作表已合并为一个 dataframe(使用 map_df)。 不幸的是,名字是西班牙语,随着代码的进行,它会产生 R 的问题。 重音名称仅在列名称中,关于重音名称时如何解决此问题的任何提示或建议? 不确定它是否来自 xlsx 文件是我尝试过的代码不起作用的原因。 谢谢你。
根据要求输入数据样本:
structure(list(file = c("location1/location2/namelocationfile1.xlsx",
"location1/location2/namelocationfile2.xlsx",
"location1/location2/namelocationfile3.xlsx",
"location1/location2/namelocationfile4.xlsx",
"location1/location2/namelocationfile5.xlsx",
"location1/location2/namelocationfile6.xlsx"
), sheet = c("TOTAL-2015 ", "TOTAL-2015 ", "TOTAL-2015 ", "TOTAL-2015 ",
"TOTAL-2015 ", "TOTAL-2015 "), age = c("Total", "0-4", "0",
"1", "2", "3"), total = c("355461", "35173", "7091", "7042",
"7027", "7008"), plán = c("126131", "11698", "2407", "2318",
"2349", "2282"), pláns = c("8456", "726", "162", "135", "133",
"138"), place = c("35112", "2969", "599", "607", "555",
"597"), concepción = c("12912", "1283", "281", "263",
"244", "253"), refugio = c("10959", "903", "174", "174", "206",
"184"), lugar = c("20733", "2229", "431", "454", "409", "486"
), san_marco = c("31082", "3271", "624", "658", "670", "656"),
menéndez = c("47495", "5070", "990", "1023",
"1008", "1020"), san = c("10244", "955", "193", "203",
"189", "194"), san_pedro = c("8374", "915", "183",
"181", "205", "175"), buenosaires = c("33242", "4244", "862",
"857", "894", "836"), turín = c("10721", "910", "185", "169",
"165", "187")), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
这是iconv
的一种可能解决方案:
string <- c("à", "è", "ì"," ò"," ù", "À", "È", "Ì", "Ò", "Ù")
# [1] "à" "è" "ì" " ò" " ù" "À" "È" "Ì" "Ò" "Ù"
gsub("`", "", iconv(string, from = "UTF-8", , to='ASCII//TRANSLIT'))
# [1] "a" "e" "i" " o" " u" "A" "E" "I" "O" "U"
stringi
的另一个选项不需要gsub
:
library(stringi)
stri_trans_general(str = string, id = "Latin-ASCII")
# [1] "a" "e" "i" " o" " u" "A" "E" "I" "O" "U"
更新
要使用rename_with
将 function 应用于列名,我们需要在iconv
中使用.x
。 此外,对于gsub
,模式是'
而不是“`”。
library(tidyverse)
df_new <- df %>%
rename_with(., ~ gsub("'", "", iconv(.x, from = "UTF-8", to='ASCII//TRANSLIT')))
# Or we can use `stringr` instead of `gsub`:
# df %>%
# rename_with(., ~ str_replace_all(iconv(.x, to='ASCII//TRANSLIT'), "'", ""))
colnames(df_new)
# [1] "file" "sheet" "age" "total" "plan" "plans" "place" "concepcion" "refugio"
# [10] "lugar" "san_marco" "menendez" "san" "san_pedro" "buenosaires" "turin"
基地 R 选项:
colnames(df) <- gsub("'", "", iconv(colnames(df), from = "UTF-8", to='ASCII//TRANSLIT'))
或者:
colnames(df) <- stri_trans_general(str = colnames(df), id = "Latin-ASCII")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.