简体   繁体   中英

Automate detection of start and end row number of phrases

I have a dataframe like this:

df = data.frame(main_name = c("google","yahoo","google","amazon","yahoo","google"),
                volume = c(32,43,412,45,12,54))

I would like to sort it accordind to main_name, example

Aiming to know from which start row there is the specific phrase until which one in order to use it into a for loop.

main_name volume
amazon     45
google     32
google     412
google     54
yahoo      43
yahoo      12

In there any "auto" to make need without the need to know the specific phrase. Just to check if it is changed and know the start and end row number?

amazon [1]
google [2:4]
yahoo  [5:6]

With tidyverse :

df%>%
   arrange(main_name)%>%
   mutate(row=row_number())%>%
   group_by(main_name)%>%
   summarise(start=first(row),
             end=last(row))%>%
   mutate(res=glue::glue("[{start}:{end}]"))
# A tibble: 3 x 4
  main_name start   end res  
  <fct>     <int> <int> <chr>
1 amazon        1     1 [1:1]
2 google        2     4 [2:4]
3 yahoo         5     6 [5:6]

Here is an alternative base R solution using rle

with(rle(as.character(df$main_name)), setNames(mapply(
    function(x, y) sprintf("[%s:%s]", x, y),
    cumsum(lengths) - lengths + 1, cumsum(lengths)), values))
# amazon  google   yahoo
#"[1:1]" "[2:4]" "[5:6]"

Sample data

df <- read.table(text =
"main_name volume
amazon     45
google     32
google     412
google     54
yahoo      43
yahoo      12", header = T)

Here is another base R option

with(df, tapply(seq_along(main_name), main_name, FUN = 
  function(x) do.call(sprintf, c(fmt = "[%d:%d]", as.list(range(x))))))
#  amazon  google   yahoo 
# "[1:1]" "[2:4]" "[5:6]" 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM