Automate detection of start and end row number of phrases

Question

I have a dataframe like this:

df = data.frame(main_name = c("google","yahoo","google","amazon","yahoo","google"),
                volume = c(32,43,412,45,12,54))

I would like to sort it accordind to main_name, example

Aiming to know from which start row there is the specific phrase until which one in order to use it into a for loop.

main_name volume
amazon     45
google     32
google     412
google     54
yahoo      43
yahoo      12

In there any "auto" to make need without the need to know the specific phrase. Just to check if it is changed and know the start and end row number?

amazon [1]
google [2:4]
yahoo  [5:6]

Answer 1

With tidyverse :

df%>%
   arrange(main_name)%>%
   mutate(row=row_number())%>%
   group_by(main_name)%>%
   summarise(start=first(row),
             end=last(row))%>%
   mutate(res=glue::glue("[{start}:{end}]"))
# A tibble: 3 x 4
  main_name start   end res  
  <fct>     <int> <int> <chr>
1 amazon        1     1 [1:1]
2 google        2     4 [2:4]
3 yahoo         5     6 [5:6]

Answer 2

Here is an alternative base R solution using rle

with(rle(as.character(df$main_name)), setNames(mapply(
    function(x, y) sprintf("[%s:%s]", x, y),
    cumsum(lengths) - lengths + 1, cumsum(lengths)), values))
# amazon  google   yahoo
#"[1:1]" "[2:4]" "[5:6]"

Sample data

df <- read.table(text =
"main_name volume
amazon     45
google     32
google     412
google     54
yahoo      43
yahoo      12", header = T)

Answer 3

Here is another base R option

with(df, tapply(seq_along(main_name), main_name, FUN = 
  function(x) do.call(sprintf, c(fmt = "[%d:%d]", as.list(range(x))))))
#  amazon  google   yahoo 
# "[1:1]" "[2:4]" "[5:6]"

Automate detection of start and end row number of phrases

Question

3 answers

solution1
1 ACCPTED 2018-10-23 13:30:11

solution2
1 2018-10-23 13:39:49

Sample data

solution3
1 2018-10-23 15:16:31

Automate detection of start and end row number of phrases

Question

3 answers

solution1 1 ACCPTED 2018-10-23 13:30:11

solution2 1 2018-10-23 13:39:49

Sample data

solution3 1 2018-10-23 15:16:31

solution1
1 ACCPTED 2018-10-23 13:30:11

solution2
1 2018-10-23 13:39:49

solution3
1 2018-10-23 15:16:31