Question:
How can you use R to remove all special characters from a dataframe, quickly and efficiently?
Progress:
This SO post details how to remove special characters. I can apply the gsub function to single columns (images 1 and 2), but not the entire dataframe.
Problem:
My dataframe consists of 100+ columns of integers, string, etc. When I try to run the gsub on the dataframe, it doesn't return the output I desire. Instead, I get what's shown in image 3.
df <- read.csv("C:/test.csv")
dfa <- gsub("[[:punct:]]", "", df$a) #this works on a single column
dfb <- gsub("[[:punct:]]", "", df$b) #this works on a single column
df_all <- gsub("[[:punct:]]", "", df) #this does not work on the entire df
View(df_all)
df - This is the original dataframe:
dfa - This is gsub applied to column b. Good!
df_all - This is gsub applied to the entire dataframe. Bad!
Summary:
Is there a way to gsub an entire dataframe? Else, should an apply function be used instead?
Here is a possible solution using dplyr:
# Example data
bla <- data.frame(a = c(1,2,3),
b = c("fefa%^%", "fes^%#$%", "gD%^E%Ewfseges"),
c = c("%#%$#^#", "%#$#%@", ",.,gdgd$%,."))
# Use mutate_all from dplyr
bla %>%
mutate_all(funs(gsub("[[:punct:]]", "", .)))
a b c
1 1 fefa
2 2 fes
3 3 gDEEwfseges gdgd
Update:
mutate_all
has been superseded , and funs
is deprecated as of dplyr 0.8.0. Here is an updated solution using mutate
and across
:
# Example data
df <- data.frame(a = c(1,2,3),
b = c("fefa%^%", "fes^%#$%", "gD%^E%Ewfseges"),
c = c("%#%$#^#", "%#$#%@", ",.,gdgd$%,."))
# Use mutate_all from dplyr
df %>%
mutate(across(everything(),~ gsub("[[:punct:]]", "", .)))
另一种解决方案是先将数据帧转换为矩阵,然后运行 gsub,然后再转换回数据帧,如下所示:
as.data.frame(gsub("[[:punct:]]", "", as.matrix(df)))
I like Ryan's answer using dplyr. As mutate_all
and funs
are now deprecated, here is my suggested updated solution using mutate
and across
:
# Example data
df <- data.frame(a = c(1,2,3),
b = c("fefa%^%", "fes^%#$%", "gD%^E%Ewfseges"),
c = c("%#%$#^#", "%#$#%@", ",.,gdgd$%,."))
# Use mutate_all from dplyr
df %>%
mutate(across(everything(),~ gsub("[[:punct:]]", "", .)))
a b c
1 1 fefa
2 2 fes
3 3 gDEEwfseges gdgd
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.