I am working on a dataframe like below and want to count the occurrence of some patterns ('B' and 'C') from column A.
The code which uses rowwise, mutate & grepl do work but using rowwise is pretty slow. I am wondering if there are any alternatives to rowwise for getting the same result?
temp <- data.frame(
A = c('A','B','C','BC')
)
temp %>%
dplyr::rowwise() %>%
mutate( B = sum(grepl(pattern = 'B',A),grepl(pattern = 'C',A) ) )
Results:
# A tibble: 4 x 2
# Rowwise:
A Count
<chr> <int>
1 A 0
2 B 1
3 C 1
4 BC 2
grepl
is vectorized, it's your sum
that is the problem. Use +
instead:
temp %>%
mutate(
Count = grepl(pattern = 'B', A) + grepl(pattern = 'C', A)
)
# A Count
# 1 A 0
# 2 B 1
# 3 C 1
# 4 BC 2
It's the same difference as this:
sum(1:3, 1:3)
# [1] 12
1:3 + 1:3
# [1] 2 4 6
You can use str_count()
from stringr
as it is vectorized over string and pattern:
temp %>%
mutate(Count = str_count(A, "B|C"))
A Count
1 A 0
2 B 1
3 C 1
4 BC 2
A base R
option with nchar
and gsub
nchar(gsub("[^BC]", "", temp$A))
#[1] 0 1 1 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.