将数据帧中所有字符变量中的所有值从小写转换为大写

Question

I have a mixed dataframe of character and numeric variables.我有一个字符和数字变量的混合数据框。

city,hs_cd,sl_no,col_01,col_02,col_03
Austin,1,2,,46,Female
Austin,1,3,,32,Male
Austin,1,4,,27,Male
Austin,1,5,,20,Female
Austin,2,2,,42,Female
Austin,2,1,,52,Male
Austin,2,3,,25,Male
Austin,2,4,,22,Female
Austin,3,3,,30,Female
Austin,3,1,,65,Female

I want to convert all the lower-case characters in the dataframe to uppercase.我想将数据框中的所有小写字符转换为大写。 Is there any way to do this in one shot without doing it repeatedly over each character-variable?有没有办法一次性做到这一点，而无需在每个字符变量上重复执行？

Answer 1

Starting with the following sample data :从以下示例数据开始：

df <- data.frame(v1=letters[1:5],v2=1:5,v3=letters[10:14],stringsAsFactors=FALSE)

  v1 v2 v3
1  a  1  j
2  b  2  k
3  c  3  l
4  d  4  m
5  e  5  n

You can use :您可以使用：

data.frame(lapply(df, function(v) {
  if (is.character(v)) return(toupper(v))
  else return(v)
}))

Which gives :这使：

Answer 2

From the dplyr package you can also use the mutate_all() function in combination with toupper() .从dplyr包中，您还可以将mutate_all()函数与toupper()结合使用。 This will affect both character and factor classes.这将影响字符和因子类。

library(dplyr)
df <- mutate_all(df, funs=toupper)

Answer 3

It simple with apply function in R R中的apply函数很简单

f <- apply(f,2,toupper)

No need to check if the column is character or any other type.无需检查列是字符还是任何其他类型。

Answer 4

A side comment here for those using any of these answers.对于使用这些答案中的任何一个的人，这里有一个旁注。 Juba's answer is great, as it's very selective if your variables are either numberic or character strings. Juba 的回答很好，因为如果您的变量是数字或字符串，它的选择性非常大。 If however, you have a combination (eg a1, b1, a2, b2) etc. It will not convert the characters properly.但是，如果您有组合（例如 a1、b1、a2、b2）等。它不会正确转换字符。

As @Trenton Hoffman notes,正如@Trenton Hoffman 指出的那样，

library(dplyr)
df <- mutate_each(df, funs(toupper))

affects both character and factor classes and works for "mixed variables";影响字符和因子类并适用于“混合变量”； eg if your variable contains both a character and a numberic value (eg a1) both will be converted to a factor.例如，如果您的变量同时包含一个字符和一个数字值（例如 a1），两者都将被转换为一个因子。 Overall this isn't too much of a concern, but if you end up wanting match data.frames for example总的来说，这不是什么大问题，但如果你最终想要匹配 data.frames 例如

df3 <- df1[df1$v1 %in% df2$v1,]

where df1 has been has been converted and df2 contains a non-converted data.frame or similar, this may cause some problems.其中 df1 已被转换并且 df2 包含未转换的 data.frame 或类似的，这可能会导致一些问题。 The work around is that you briefly have to run解决方法是您必须短暂地运行

df2 <- df2 %>% mutate_each(funs(toupper), v1)
#or
df2 <- df2 %>% mutate_each(df2, funs(toupper))
#and then
df3 <- df1[df1$v1 %in% df2$v1,]

If you work with genomic data, this is when knowing this can come in handy.如果您使用基因组数据，这就是知道这可以派上用场的时候。

Answer 5

Another alternative is to use a combination of mutate_if() and str_to_upper() function, both from the tidyverse package:另一种选择是使用mutate_if()和str_to_upper()函数的组合，两者都来自 tidyverse 包：

df %>% mutate_if(is.character, str_to_upper) -> df

This will convert all string variables in the data frame to upper case.这会将数据框中的所有字符串变量转换为大写。 str_to_lower() do the opposite. str_to_lower()做相反的事情。

Answer 6

dplyr >= 1.0.0 dplyr >= 1.0.0

Scoped verbs that end in _if , _at , _all have been superseded by the use of across() in packageVersion("dplyr") 1.0.0 or newer.以_if 、 _at 、 _all结尾的范围动词已被packageVersion("dplyr") 1.0.0 或更新版本中的packageVersion("dplyr") across()取代。 To do this using across :为此，请使用across ：

df %>% 
  dplyr::mutate(across(where(is.character), toupper))

The first argument to across is which columns to transform using tidyselect syntax.的第一个参数across是列于使用变换tidyselect语法。 The above will apply the function across all columns that are character.以上将在所有字符列中应用该函数。
The second argument to across is the function to apply.第二个参数，以across是应用功能。 This also supports lambda-style syntax: ~ toupper(.x) that make setting additional function arguments easy and clear.这也支持 lambda 风格的语法： ~ toupper(.x) ，这使得设置附加函数参数变得简单明了。

Data数据

df <- structure(list(city = c("Austin", "Austin", "Austin", "Austin", 
"Austin", "Austin", "Austin", "Austin", "Austin", "Austin"), 
    hs_cd = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), sl_no = c(2L, 
    3L, 4L, 5L, 2L, 1L, 3L, 4L, 3L, 1L), col_01 = c(NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), col_02 = c(46L, 32L, 27L, 20L, 
    42L, 52L, 25L, 22L, 30L, 65L), col_03 = c("Female", "Male", 
    "Male", "Female", "Female", "Male", "Male", "Female", "Female", 
    "Female")), class = "data.frame", row.names = c(NA, -10L))

Answer 7

If you need to deal with data.frames that include factors you can use:如果您需要处理包含因素的 data.frames，您可以使用：

df = data.frame(v1=letters[1:5],v2=1:5,v3=letters[10:14],v4=as.factor(letters[1:5]),v5=runif(5),stringsAsFactors=FALSE)

df
    v1 v2 v3 v4        v5
    1  a  1  j  a 0.1774909
    2  b  2  k  b 0.4405019
    3  c  3  l  c 0.7042878
    4  d  4  m  d 0.8829965
    5  e  5  n  e 0.9702505


sapply(df,class)
         v1          v2          v3          v4          v5
"character"   "integer" "character"    "factor"   "numeric"

Use mutate_each_ to convert factors to character then convert all to uppercase使用 mutate_each_ 将因子转换为字符，然后全部转换为大写

   upper_it = function(X){X %>% mutate_each_( funs(as.character(.)), names( .[sapply(., is.factor)] )) %>%
   mutate_each_( funs(toupper), names( .[sapply(., is.character)] ))}   # convert factor to character then uppercase

Gives给

  upper_it(df)
      v1 v2 v3 v4
    1  A  1  J  A
    2  B  2  K  B
    3  C  3  L  C
    4  D  4  M  D
    5  E  5  N  E

While尽管

sapply( upper_it(df),class)
         v1          v2          v3          v4          v5
"character"   "integer" "character" "character"   "numeric"

Answer 8

或者，如果您只想将特定行转换为大写，请使用以下代码：

df[[1]] <- toupper(df[[1]])

将数据帧中所有字符变量中的所有值从小写转换为大写

问题描述

8 个解决方案

解决方案1
84 已采纳 2013-05-13 07:22:06

解决方案2
50 2015-05-20 18:31:01

解决方案3
10 2017-11-14 10:32:03

解决方案4
6 2015-06-11 02:09:27

解决方案5
3 2019-05-26 20:28:59

解决方案6
2 2021-03-15 17:26:55

dplyr >= 1.0.0 dplyr >= 1.0.0

解决方案7
1 2016-09-19 19:59:06

解决方案8
1 2019-08-12 06:05:32

将数据帧中所有字符变量中的所有值从小写转换为大写

问题描述

8 个解决方案

解决方案1 84 已采纳 2013-05-13 07:22:06

解决方案2 50 2015-05-20 18:31:01

解决方案3 10 2017-11-14 10:32:03

解决方案4 6 2015-06-11 02:09:27

解决方案5 3 2019-05-26 20:28:59

解决方案6 2 2021-03-15 17:26:55

dplyr >= 1.0.0 dplyr >= 1.0.0

解决方案7 1 2016-09-19 19:59:06

解决方案8 1 2019-08-12 06:05:32

解决方案1
84 已采纳 2013-05-13 07:22:06

解决方案2
50 2015-05-20 18:31:01

解决方案3
10 2017-11-14 10:32:03

解决方案4
6 2015-06-11 02:09:27

解决方案5
3 2019-05-26 20:28:59

解决方案6
2 2021-03-15 17:26:55

解决方案7
1 2016-09-19 19:59:06

解决方案8
1 2019-08-12 06:05:32