简体   繁体   English

从R中的整个数据框中删除特殊字符

[英]Remove special characters from entire dataframe in R

Question:问题:

How can you use R to remove all special characters from a dataframe, quickly and efficiently?如何使用 R 快速有效地从数据框中删除所有特殊字符?

Progress:进步:

This SO post details how to remove special characters.这篇 SO 帖子详细介绍了如何删除特殊字符。 I can apply the gsub function to single columns (images 1 and 2), but not the entire dataframe.我可以将 gsub 函数应用于单个列(图像 1 和 2),但不能应用于整个数据帧。

Problem:问题:

My dataframe consists of 100+ columns of integers, string, etc. When I try to run the gsub on the dataframe, it doesn't return the output I desire.我的数据框由 100 多列整数、字符串等组成。当我尝试在数据框上运行 gsub 时,它不会返回我想要的输出。 Instead, I get what's shown in image 3.相反,我得到了图 3 中显示的内容。

df <- read.csv("C:/test.csv")
dfa <- gsub("[[:punct:]]", "", df$a) #this works on a single column
dfb <- gsub("[[:punct:]]", "", df$b) #this works on a single column
df_all <- gsub("[[:punct:]]", "", df) #this does not work on the entire df
View(df_all)

df - This is the original dataframe: df - 这是原始数据框:

原始数据框

dfa - This is gsub applied to column b. dfa - 这是应用于 b 列的 gsub。 Good!好的!

gsub 应用于 b 列

df_all - This is gsub applied to the entire dataframe. df_all - 这是应用于整个数据帧的 gsub。 Bad!坏的!

gsub 应用于整个数据框

Summary:概括:

Is there a way to gsub an entire dataframe?有没有办法 gsub 整个数据框? Else, should an apply function be used instead?否则,是否应该使用 apply 函数?

Here is a possible solution using dplyr:这是使用 dplyr 的可能解决方案:

# Example data
bla <- data.frame(a = c(1,2,3), 
              b = c("fefa%^%", "fes^%#$%", "gD%^E%Ewfseges"), 
              c = c("%#%$#^#", "%#$#%@", ",.,gdgd$%,."))

# Use mutate_all from dplyr
bla %>%
  mutate_all(funs(gsub("[[:punct:]]", "", .)))

  a           b    c
1 1        fefa     
2 2         fes     
3 3 gDEEwfseges gdgd

Update:更新:

mutate_all has been superseded , and funs is deprecated as of dplyr 0.8.0. mutate_all已被取代,并且从funs 0.8.0 开始不推荐使用funs。 Here is an updated solution using mutate and across :这是使用mutate和 cross across更新解决方案:

# Example data
df <- data.frame(a = c(1,2,3), 
                 b = c("fefa%^%", "fes^%#$%", "gD%^E%Ewfseges"), 
                 c = c("%#%$#^#", "%#$#%@", ",.,gdgd$%,."))

# Use mutate_all from dplyr
df %>%
  mutate(across(everything(),~ gsub("[[:punct:]]", "", .)))

另一种解决方案是先将数据帧转换为矩阵,然后运行 ​​gsub,然后再转换回数据帧,如下所示:

as.data.frame(gsub("[[:punct:]]", "", as.matrix(df))) 

I like Ryan's answer using dplyr.我喜欢 Ryan 使用 dplyr 的回答。 As mutate_all and funs are now deprecated, here is my suggested updated solution using mutate and across :由于mutate_allfuns现在已弃用,这是我建议的使用mutate和 cross across更新解决方案:

# Example data
df <- data.frame(a = c(1,2,3), 
                 b = c("fefa%^%", "fes^%#$%", "gD%^E%Ewfseges"), 
                 c = c("%#%$#^#", "%#$#%@", ",.,gdgd$%,."))

# Use mutate_all from dplyr
df %>%
  mutate(across(everything(),~ gsub("[[:punct:]]", "", .)))

  a           b    c
1 1        fefa     
2 2         fes     
3 3 gDEEwfseges gdgd

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM