[英]Removing rows from dataframe that contains string in a particular column
So I'm cleaning up a huge data file in R and an example is as shown:所以我在 R 中清理一个巨大的数据文件,一个例子如下所示:
ID Score
1001 4
1002 2
1003 h
1004 v
1005 3
Because the class of Score column is "character", I want to use the as.numeric function to convert 4,20 and 30 to numeric values.因为 Score 列的类是“字符”,所以我想使用 as.numeric 函数将 4,20 和 30 转换为数值。 But since the data is dirty (contains unreasonable strings like h, v), I get the message:但是由于数据是脏的(包含不合理的字符串,如 h、v),我收到消息:
NAs introduced by coercion.
When i run the following:当我运行以下命令时:
as.numeric(df$Score)
So what i want to do now is to remove the rows in the dataframe that contains strings with letters so that i would obtain:所以我现在想要做的是删除包含带字母的字符串的数据框中的行,以便我获得:
ID Score
1001 4
1002 2
1005 3
There are multiple ways you can do this :有多种方法可以做到这一点:
Convert to numeric and remove NA
values转换为数字并删除NA
值
subset(df, !is.na(as.numeric(Score)))
# ID Score
#1 1001 4
#2 1002 20
#5 1005 30
Or with grepl
find if there are any non-numeric characters in them and remove them或者使用grepl
查找其中是否有任何非数字字符并将其删除
subset(df, !grepl('\\D', Score))
This can be done with grep
as well.这也可以用grep
来完成。
df[grep('\\D', df$Score, invert = TRUE), ]
data数据
df <- structure(list(ID = 1001:1005, Score = c("4", "20", "h", "v",
"30")), class = "data.frame", row.names = c(NA, -5L))
You may use the str_detect
in the tidyverse
package, as follows:您可以使用str_detect
在tidyverse
包,如下所示:
df[str_detect(df$Score, "\\d"),]
or或者
df %>% filter(str_detect(df$Score, "\\d"))
Both produce the output:两者都产生输出:
# ID Score
#1 1001 4
#2 1002 20
#5 1005 30
Hope it helps.希望能帮助到你。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.