[英]R. Problem removing "#NAME?" (from an excel import) in dataframe
I have a.csv import from excel that has formula hangups that I am trying to remove.我有一个从 excel 导入的 .csv ,其中有我要删除的公式挂断。 A simple version of the data is below.
数据的简单版本如下。
library(tidyverse)
df <- data.frame(
species = letters[1:5],
param1 = c("Place", "creek", "river", "#VALUE!", "desert"),
param2 = c(-23.8, 43.23, "#NAME?", 45, 0.23),
param3 = c(2.4, 2, 5.7, 0.00003, -2.5),
stringsAsFactors = FALSE
) # This is a simplified version of the excel .csv import
df[df == "#VALUE!"] <- "" # Removes excel cells where the formula left "#VALUE!"
df[df == "#NAME\\?"] <- "" # This does not work
ndf <- df # This is an attempt to reassign the columns to numeric
ndf
class(ndf$param2)
class(ndf$param3)
The main problem is that the data column Param2
with this left in it is assigned to character
when it needs to be numeric
, or the functions I have to run on it do not work.主要问题是数据列
Param2
在它需要为numeric
时被分配给character
,或者我必须在其上运行的功能不起作用。
I've tried many different things, however I always nothing seems to recognise the cell.我尝试了很多不同的东西,但是我似乎总是什么都认不出这个细胞。 How do I remove "#NAME?"
如何删除“#NAME”? across the df please?
请穿过df?
You are doing an exact match (and not a regex match) so you don't need to escape special variables (like ?
, !
) differently.您正在进行完全匹配(而不是正则表达式匹配),因此您不需要以不同的方式转义特殊变量(如
?
, !
)。 Try:尝试:
df[df == "#VALUE!"] <- ""
df[df == "#NAME?"] <- NA
df <- type.convert(df, as.is = TRUE)
df
# species param1 param2 param3
#1 a Place -23.80 2.40000
#2 b creek 43.23 2.00000
#3 c river NA 5.70000
#4 d 45.00 0.00003
#5 e desert 0.23 -2.50000
str(df)
#'data.frame': 5 obs. of 4 variables:
# $ species: chr "a" "b" "c" "d" ...
# $ param1 : chr "Place" "creek" "river" "" ...
# $ param2 : num -23.8 43.23 NA 45 0.23
# $ param3 : num 2.4 2 5.7 0.00003 -2.5
Here's a dplyr
solution with sub
to replace the unwanted values in one go:这是一个
dplyr
解决方案,用sub
替换 go 中不需要的值:
df %>%
mutate(across(matches("\\d"), ~sub("#.*", "NA", .)))
species param1 param2 param3
1 a Place -23.8 2.4
2 b creek 43.23 2
3 c river NA 5.7
4 d NA 45 3e-05
5 e desert 0.23 -2.5
This solution is helpful if you do not know in which columns the unwanted values occur:如果您不知道不需要的值出现在哪些列中,此解决方案会很有帮助:
library(stringr)
df %>%
mutate(across(where(~any(str_detect(.,"#"))), ~sub("#.*", "NA", .)))
This third solution both replaces the unwanted values anywhere and converts the columns to their correct type (thanks to @Ronak for inspiration):这第三个解决方案既替换了任何地方不需要的值,又将列转换为正确的类型(感谢@Ronak 的启发):
df %>%
mutate(across(where(~any(str_detect(.,"#"))), ~sub("#.*", "NA", .)),
across(everything(), ~type.convert(., as.is = TRUE)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.