I have a data frame with >100 million rows. I need to subset out the rows that carry a particular character (regex), but it's taking a long time because it reads the input row by row. Is there a more efficient way of doing this?
A example of the data and the function. Thanks!
search_name = function(name) {
tf = apply(X = hpot["NAME"],
MARGIN = 1,
FUN = grepl,
pattern = name)
df = hpot[tf == TRUE, ]
return(df)
}
hpot = data.frame(NAME = c("alpha", "beta", "gamma", "delta", "alpha2",
"beta3", "gamma4", "zeta"),
AGE = c(12, 23, 34, 45, 56, 67, 78, 89),
HEIGHT = c(123, 134, 145, 156, 167, 178, 189, 190),
HOUSE = c("A", "B", "C", "D", "A", "B", "C", "D"),
stringsAsFactors = FALSE)
>search_name("beta")
NAME AGE HEIGHT HOUSE
2 beta 23 134 B
6 beta3 67 178 B
Thanks @lmo!
search_name = function(name) {
return(hpot[grepl(name, hpot$NAME, fixed = TRUE), ])
}
> search_name("beta")
NAME AGE HEIGHT HOUSE
2 beta 23 134 B
6 beta3 67 178 B
> search_name("alpha")
NAME AGE HEIGHT HOUSE
1 alpha 12 123 A
5 alpha2 56 167 A
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.