简体   繁体   中英

Alternatives to mutate & rowwise & grepl

I am working on a dataframe like below and want to count the occurrence of some patterns ('B' and 'C') from column A.

The code which uses rowwise, mutate & grepl do work but using rowwise is pretty slow. I am wondering if there are any alternatives to rowwise for getting the same result?

temp <- data.frame(
  A = c('A','B','C','BC')
)

temp %>% 
  dplyr::rowwise() %>%
  mutate( B = sum(grepl(pattern = 'B',A),grepl(pattern = 'C',A) ) )

Results:

# A tibble: 4 x 2
# Rowwise: 
  A     Count
  <chr> <int>
1 A         0
2 B         1
3 C         1
4 BC        2

grepl is vectorized, it's your sum that is the problem. Use + instead:

temp %>% 
  mutate( 
    Count = grepl(pattern = 'B', A) + grepl(pattern = 'C', A)
  )
#    A Count
# 1  A     0
# 2  B     1
# 3  C     1
# 4 BC     2

It's the same difference as this:

sum(1:3,  1:3)
# [1] 12

1:3 + 1:3
# [1] 2 4 6

You can use str_count() from stringr as it is vectorized over string and pattern:

temp %>%
 mutate(Count = str_count(A, "B|C"))

   A Count
1  A     0
2  B     1
3  C     1
4 BC     2

A base R option with nchar and gsub

nchar(gsub("[^BC]", "", temp$A))
#[1] 0 1 1 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM