简体   繁体   中英

R apply a function on a column if column exist in dataframe

I wonder how can I first check if the column exist in a dataframe, and if yes, how can I modify this column. This should be part of the larger function.

Working example:

# working example
dd <- data.frame(a = c(1,2),
                 b = c(2,3),
                 c = c("a", "f"))
# Check if the "a" filed exist, if yes, change all values the whole column
if("a" %in% colnames(dd))
{
  print(dd$a)
  Encoding(dd$a) <- "UTF-8"
}

This brings me error:

Error in `Encoding<-`(`*tmp*`, value = "UTF-8") : 
  a character vector argument expected 

I feel that there is something wrong in this logic, but I can't figure out the correct application?

To print the relevant column if it exists in the dataframe:

as.character(dd[,"a"]) 

to modify it, if it exists:

 Encoding(as.character(dd[,"a"]))

Also, in R version 4.0, strings are no longer converted into factors automatically, so you are probably using an earlier version of R.

The problem here is that the character vector was converted to a factor variable, which is the default option of data.frame() . The solution here simply is to use stringsAsFactors = FALSE :

  dd <- data.frame(a = c(1,2),
                   b = c(2,3),
                   c = c("a", "Maur\xC3\xADcio"), stringsAsFactors = FALSE)

Encoding(dd$c)
#> [1] "unknown" "unknown"

# Check if the "c" field exists. If yes, convert the encoding of the variable
if("c" %in% colnames(dd)) {
  print(dd$c)
  Encoding(dd$c) <- "UTF-8"

}
#> [1] "a"        "Maurício"

Encoding(dd$c)
#> [1] "unknown" "UTF-8"

Created on 2020-05-22 by the reprex package (v0.3.0)

Note: As the encoding for the Latin alphabets cannot be checked, I've changed the example a bit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM