简体   繁体   English

将数据帧中所有字符变量中的所有值从小写转换为大写

[英]Convert from lowercase to uppercase all values in all character variables in dataframe

I have a mixed dataframe of character and numeric variables.我有一个字符和数字变量的混合数据框

city,hs_cd,sl_no,col_01,col_02,col_03
Austin,1,2,,46,Female
Austin,1,3,,32,Male
Austin,1,4,,27,Male
Austin,1,5,,20,Female
Austin,2,2,,42,Female
Austin,2,1,,52,Male
Austin,2,3,,25,Male
Austin,2,4,,22,Female
Austin,3,3,,30,Female
Austin,3,1,,65,Female

I want to convert all the lower-case characters in the dataframe to uppercase.我想将数据框中的所有小写字符转换为大写。 Is there any way to do this in one shot without doing it repeatedly over each character-variable?有没有办法一次性做到这一点,而无需在每个字符变量上重复执行?

Starting with the following sample data :从以下示例数据开始:

df <- data.frame(v1=letters[1:5],v2=1:5,v3=letters[10:14],stringsAsFactors=FALSE)

  v1 v2 v3
1  a  1  j
2  b  2  k
3  c  3  l
4  d  4  m
5  e  5  n

You can use :您可以使用 :

data.frame(lapply(df, function(v) {
  if (is.character(v)) return(toupper(v))
  else return(v)
}))

Which gives :这使 :

  v1 v2 v3
1  A  1  J
2  B  2  K
3  C  3  L
4  D  4  M
5  E  5  N

From the dplyr package you can also use the mutate_all() function in combination with toupper() .dplyr包中,您还可以将mutate_all()函数与toupper()结合使用。 This will affect both character and factor classes.这将影响字符和因子类。

library(dplyr)
df <- mutate_all(df, funs=toupper)

It simple with apply function in R R中的apply函数很简单

f <- apply(f,2,toupper)

No need to check if the column is character or any other type.无需检查列是字符还是任何其他类型。

A side comment here for those using any of these answers.对于使用这些答案中的任何一个的人,这里有一个旁注。 Juba's answer is great, as it's very selective if your variables are either numberic or character strings. Juba 的回答很好,因为如果您的变量是数字或字符串,它的选择性非常大。 If however, you have a combination (eg a1, b1, a2, b2) etc. It will not convert the characters properly.但是,如果您有组合(例如 a1、b1、a2、b2)等。它不会正确转换字符。

As @Trenton Hoffman notes,正如@Trenton Hoffman 指出的那样,

library(dplyr)
df <- mutate_each(df, funs(toupper))

affects both character and factor classes and works for "mixed variables";影响字符和因子类并适用于“混合变量”; eg if your variable contains both a character and a numberic value (eg a1) both will be converted to a factor.例如,如果您的变量同时包含一个字符和一个数字值(例如 a1),两者都将被转换为一个因子。 Overall this isn't too much of a concern, but if you end up wanting match data.frames for example总的来说,这不是什么大问题,但如果你最终想要匹配 data.frames 例如

df3 <- df1[df1$v1 %in% df2$v1,]

where df1 has been has been converted and df2 contains a non-converted data.frame or similar, this may cause some problems.其中 df1 已被转换并且 df2 包含未转换的 data.frame 或类似的,这可能会导致一些问题。 The work around is that you briefly have to run解决方法是您必须短暂地运行

df2 <- df2 %>% mutate_each(funs(toupper), v1)
#or
df2 <- df2 %>% mutate_each(df2, funs(toupper))
#and then
df3 <- df1[df1$v1 %in% df2$v1,]

If you work with genomic data, this is when knowing this can come in handy.如果您使用基因组数据,这就是知道这可以派上用场的时候。

Another alternative is to use a combination of mutate_if() and str_to_upper() function, both from the tidyverse package:另一种选择是使用mutate_if()str_to_upper()函数的组合,两者都来自 tidyverse 包:

df %>% mutate_if(is.character, str_to_upper) -> df

This will convert all string variables in the data frame to upper case.这会将数据框中的所有字符串变量转换为大写。 str_to_lower() do the opposite. str_to_lower()做相反的事情。

dplyr >= 1.0.0 dplyr >= 1.0.0

Scoped verbs that end in _if , _at , _all have been superseded by the use of across() in packageVersion("dplyr") 1.0.0 or newer._if_at_all结尾的范围动词已被packageVersion("dplyr") 1.0.0 或更新版本中的packageVersion("dplyr") across()取代。 To do this using across :为此,请使用across

df %>% 
  dplyr::mutate(across(where(is.character), toupper))
  • The first argument to across is which columns to transform using tidyselect syntax.的第一个参数across是列于使用变换tidyselect语法。 The above will apply the function across all columns that are character.以上将在所有字符列中应用该函数。
  • The second argument to across is the function to apply.第二个参数,以across是应用功能。 This also supports lambda-style syntax: ~ toupper(.x) that make setting additional function arguments easy and clear.这也支持 lambda 风格的语法: ~ toupper(.x) ,这使得设置附加函数参数变得简单明了。

Data数据

df <- structure(list(city = c("Austin", "Austin", "Austin", "Austin", 
"Austin", "Austin", "Austin", "Austin", "Austin", "Austin"), 
    hs_cd = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), sl_no = c(2L, 
    3L, 4L, 5L, 2L, 1L, 3L, 4L, 3L, 1L), col_01 = c(NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), col_02 = c(46L, 32L, 27L, 20L, 
    42L, 52L, 25L, 22L, 30L, 65L), col_03 = c("Female", "Male", 
    "Male", "Female", "Female", "Male", "Male", "Female", "Female", 
    "Female")), class = "data.frame", row.names = c(NA, -10L))

If you need to deal with data.frames that include factors you can use:如果您需要处理包含因素的 data.frames,您可以使用:

df = data.frame(v1=letters[1:5],v2=1:5,v3=letters[10:14],v4=as.factor(letters[1:5]),v5=runif(5),stringsAsFactors=FALSE)

df
    v1 v2 v3 v4        v5
    1  a  1  j  a 0.1774909
    2  b  2  k  b 0.4405019
    3  c  3  l  c 0.7042878
    4  d  4  m  d 0.8829965
    5  e  5  n  e 0.9702505


sapply(df,class)
         v1          v2          v3          v4          v5
"character"   "integer" "character"    "factor"   "numeric"

Use mutate_each_ to convert factors to character then convert all to uppercase使用 mutate_each_ 将因子转换为字符,然后全部转换为大写

   upper_it = function(X){X %>% mutate_each_( funs(as.character(.)), names( .[sapply(., is.factor)] )) %>%
   mutate_each_( funs(toupper), names( .[sapply(., is.character)] ))}   # convert factor to character then uppercase

Gives

  upper_it(df)
      v1 v2 v3 v4
    1  A  1  J  A
    2  B  2  K  B
    3  C  3  L  C
    4  D  4  M  D
    5  E  5  N  E

While尽管

sapply( upper_it(df),class)
         v1          v2          v3          v4          v5
"character"   "integer" "character" "character"   "numeric"

或者,如果您只想将特定行转换为大写,请使用以下代码:

df[[1]] <- toupper(df[[1]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用R将具有全部CAPS的单词除外,将句子中所有字母从大写转换为小写 - Convert all letters from uppercase to lowercase in a sentence except words with all CAPS using R dataframe 中除 xyz 外的所有字符列小写 - Lowercase all character columns except xyz in dataframe 在混合 dataframe(字符、向量、整数)中将小写转换为大写,同时保留 R 中的数据类型? - Convert lowercase to uppercase in a mixed dataframe (character, vector, integer) while preserving data types in R? R-将数据框中所有列的数据类型从字符动态转换为数字 - R - convert datatype of all columns in a dataframe from character to numeric dynamically 如何将df列表中的所有变量转换为字符 - How to convert all variables in a list of df to character 如何将所有值转换为数据帧中的十亿? - How to convert all the values to billion in dataframe? 如何从数据框中拆分所有变量? - How to split all the variables from a dataframe? 根据数据框中的所有类别变量创建伪变量 - Create dummy variables from all categorical variables in a dataframe 将dataframe中所有值与字符串一起使用gsub - Using gsub from all values in dataframe with strings R按顺序将多个数据帧行的所有值转换为一个向量 - R Convert all values of multiple dataframe rows to one vector in order
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM