從包含特定列中的字符串的數據框中刪除行

Question

所以我在 R 中清理一個巨大的數據文件，一個例子如下所示：

ID       Score
1001       4
1002       2
1003       h
1004       v
1005       3

因為 Score 列的類是“字符”，所以我想使用 as.numeric 函數將 4,20 和 30 轉換為數值。 但是由於數據是臟的（包含不合理的字符串，如 h、v），我收到消息：

NAs introduced by coercion.

當我運行以下命令時：

as.numeric(df$Score)

所以我現在想要做的是刪除包含帶字母的字符串的數據框中的行，以便我獲得：

ID       Score
1001       4
1002       2
1005       3

Answer 1

有多種方法可以做到這一點：

轉換為數字並刪除NA值

subset(df, !is.na(as.numeric(Score)))

#    ID Score
#1 1001     4
#2 1002    20
#5 1005    30

或者使用grepl查找其中是否有任何非數字字符並將其刪除

subset(df, !grepl('\\D', Score))

這也可以用grep來完成。

df[grep('\\D', df$Score, invert = TRUE), ]

數據

df <- structure(list(ID = 1001:1005, Score = c("4", "20", "h", "v", 
"30")), class = "data.frame", row.names = c(NA, -5L))

Answer 2

您可以使用str_detect在tidyverse包，如下所示：

df[str_detect(df$Score, "\\d"),]

或者

df %>% filter(str_detect(df$Score, "\\d"))

兩者都產生輸出：

#    ID Score
#1 1001     4
#2 1002    20
#5 1005    30

希望能幫助到你。