[英]Convert from lowercase to uppercase all values in all character variables in dataframe
I have a mixed dataframe of character and numeric variables.我有一个字符和数字变量的混合数据框。
city,hs_cd,sl_no,col_01,col_02,col_03
Austin,1,2,,46,Female
Austin,1,3,,32,Male
Austin,1,4,,27,Male
Austin,1,5,,20,Female
Austin,2,2,,42,Female
Austin,2,1,,52,Male
Austin,2,3,,25,Male
Austin,2,4,,22,Female
Austin,3,3,,30,Female
Austin,3,1,,65,Female
I want to convert all the lower-case characters in the dataframe to uppercase.我想将数据框中的所有小写字符转换为大写。 Is there any way to do this in one shot without doing it repeatedly over each character-variable?
有没有办法一次性做到这一点,而无需在每个字符变量上重复执行?
Starting with the following sample data :从以下示例数据开始:
df <- data.frame(v1=letters[1:5],v2=1:5,v3=letters[10:14],stringsAsFactors=FALSE)
v1 v2 v3
1 a 1 j
2 b 2 k
3 c 3 l
4 d 4 m
5 e 5 n
You can use :您可以使用 :
data.frame(lapply(df, function(v) {
if (is.character(v)) return(toupper(v))
else return(v)
}))
Which gives :这使 :
v1 v2 v3
1 A 1 J
2 B 2 K
3 C 3 L
4 D 4 M
5 E 5 N
From the dplyr
package you can also use the mutate_all()
function in combination with toupper()
.从
dplyr
包中,您还可以将mutate_all()
函数与toupper()
结合使用。 This will affect both character and factor classes.这将影响字符和因子类。
library(dplyr)
df <- mutate_all(df, funs=toupper)
It simple with apply function in R R中的apply函数很简单
f <- apply(f,2,toupper)
No need to check if the column is character or any other type.无需检查列是字符还是任何其他类型。
A side comment here for those using any of these answers.对于使用这些答案中的任何一个的人,这里有一个旁注。 Juba's answer is great, as it's very selective if your variables are either numberic or character strings.
Juba 的回答很好,因为如果您的变量是数字或字符串,它的选择性非常大。 If however, you have a combination (eg a1, b1, a2, b2) etc. It will not convert the characters properly.
但是,如果您有组合(例如 a1、b1、a2、b2)等。它不会正确转换字符。
As @Trenton Hoffman notes,正如@Trenton Hoffman 指出的那样,
library(dplyr)
df <- mutate_each(df, funs(toupper))
affects both character and factor classes and works for "mixed variables";影响字符和因子类并适用于“混合变量”; eg if your variable contains both a character and a numberic value (eg a1) both will be converted to a factor.
例如,如果您的变量同时包含一个字符和一个数字值(例如 a1),两者都将被转换为一个因子。 Overall this isn't too much of a concern, but if you end up wanting match data.frames for example
总的来说,这不是什么大问题,但如果你最终想要匹配 data.frames 例如
df3 <- df1[df1$v1 %in% df2$v1,]
where df1 has been has been converted and df2 contains a non-converted data.frame or similar, this may cause some problems.其中 df1 已被转换并且 df2 包含未转换的 data.frame 或类似的,这可能会导致一些问题。 The work around is that you briefly have to run
解决方法是您必须短暂地运行
df2 <- df2 %>% mutate_each(funs(toupper), v1)
#or
df2 <- df2 %>% mutate_each(df2, funs(toupper))
#and then
df3 <- df1[df1$v1 %in% df2$v1,]
If you work with genomic data, this is when knowing this can come in handy.如果您使用基因组数据,这就是知道这可以派上用场的时候。
Another alternative is to use a combination of mutate_if()
and str_to_upper()
function, both from the tidyverse package:另一种选择是使用
mutate_if()
和str_to_upper()
函数的组合,两者都来自 tidyverse 包:
df %>% mutate_if(is.character, str_to_upper) -> df
This will convert all string variables in the data frame to upper case.这会将数据框中的所有字符串变量转换为大写。
str_to_lower()
do the opposite. str_to_lower()
做相反的事情。
Scoped verbs that end in _if
, _at
, _all
have been superseded by the use of across()
in packageVersion("dplyr")
1.0.0 or newer.以
_if
、 _at
、 _all
结尾的范围动词已被packageVersion("dplyr")
1.0.0 或更新版本中的packageVersion("dplyr")
across()
取代。 To do this using across
:为此,请使用
across
:
df %>%
dplyr::mutate(across(where(is.character), toupper))
across
is which columns to transform using tidyselect syntax.across
是列于使用变换tidyselect语法。 The above will apply the function across all columns that are character.across
is the function to apply.across
是应用功能。 This also supports lambda-style syntax: ~ toupper(.x)
that make setting additional function arguments easy and clear.~ toupper(.x)
,这使得设置附加函数参数变得简单明了。 Data数据
df <- structure(list(city = c("Austin", "Austin", "Austin", "Austin",
"Austin", "Austin", "Austin", "Austin", "Austin", "Austin"),
hs_cd = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), sl_no = c(2L,
3L, 4L, 5L, 2L, 1L, 3L, 4L, 3L, 1L), col_01 = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), col_02 = c(46L, 32L, 27L, 20L,
42L, 52L, 25L, 22L, 30L, 65L), col_03 = c("Female", "Male",
"Male", "Female", "Female", "Male", "Male", "Female", "Female",
"Female")), class = "data.frame", row.names = c(NA, -10L))
If you need to deal with data.frames that include factors you can use:如果您需要处理包含因素的 data.frames,您可以使用:
df = data.frame(v1=letters[1:5],v2=1:5,v3=letters[10:14],v4=as.factor(letters[1:5]),v5=runif(5),stringsAsFactors=FALSE)
df
v1 v2 v3 v4 v5
1 a 1 j a 0.1774909
2 b 2 k b 0.4405019
3 c 3 l c 0.7042878
4 d 4 m d 0.8829965
5 e 5 n e 0.9702505
sapply(df,class)
v1 v2 v3 v4 v5
"character" "integer" "character" "factor" "numeric"
Use mutate_each_ to convert factors to character then convert all to uppercase使用 mutate_each_ 将因子转换为字符,然后全部转换为大写
upper_it = function(X){X %>% mutate_each_( funs(as.character(.)), names( .[sapply(., is.factor)] )) %>%
mutate_each_( funs(toupper), names( .[sapply(., is.character)] ))} # convert factor to character then uppercase
Gives给
upper_it(df)
v1 v2 v3 v4
1 A 1 J A
2 B 2 K B
3 C 3 L C
4 D 4 M D
5 E 5 N E
While尽管
sapply( upper_it(df),class)
v1 v2 v3 v4 v5
"character" "integer" "character" "character" "numeric"
或者,如果您只想将特定行转换为大写,请使用以下代码:
df[[1]] <- toupper(df[[1]])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.