R subset range of rows below a certain string

Question

I have several dataframes in R of the following shape:

> pos.sentence
   doc_id token_id   token   pos
1      d1        1      Ik  PRON
2      d1        2    weet  VERB
3      d1        3     dat SCONJ
4      d1        4     jij  PRON
5      d1        5     dat SCONJ
6      d1        6     wil   AUX
7      d1        7      en CCONJ
8      d1        8      ik  PRON
9      d1        9     heb   AUX
10     d1       10     het   DET
11     d1       11      al   ADV
12     d1       12 gekocht  VERB

What I would like to do is to create subsets of the data where all the rows from PRON (which appears in the pos column) until the next instance of PRON are gathered. Thus, in this case, resulting in three separate subsets/dataframes:

   doc_id token_id   token   pos
1      d1        1      Ik  PRON
2      d1        2    weet  VERB
3      d1        3     dat SCONJ

   doc_id token_id   token   pos
4      d1        4     jij  PRON
5      d1        5     dat SCONJ
6      d1        6     wil   AUX
7      d1        7      en CCONJ

   doc_id token_id   token   pos
8      d1        8      ik  PRON
9      d1        9     heb   AUX
10     d1       10     het   DET
11     d1       11      al   ADV
12     d1       12 gekocht  VERB

Is there anyone who knows a way to do so? The dataframes that serve as my input vary in size, so I cannot make subsets on the base of row number.

Answer 1

How about this? First, determine group membership:

library(tidyverse)
z <- posdata %>% mutate(ispron=(1*(pos=="PRON"))) %>% 
    mutate(group=cumsum(c(1, sign(diff(ispron)) > 0)))

Net, split into multiple objects:

> split(z,z$group) 
$`1`
# A tibble: 3 x 6
  doc_id token_id token pos   ispron group
  <fct>     <int> <fct> <fct>  <dbl> <dbl>
1 d1            1 Ik    PRON      1.    1.
2 d1            2 weet  VERB      0.    1.
3 d1            3 dat   SCONJ     0.    1.

$`2`
# A tibble: 4 x 6
  doc_id token_id token pos   ispron group
  <fct>     <int> <fct> <fct>  <dbl> <dbl>
1 d1            4 jij   PRON      1.    2.
2 d1            5 dat   SCONJ     0.    2.
3 d1            6 wil   AUX       0.    2.
4 d1            7 en    CCONJ     0.    2.

$`3`
# A tibble: 5 x 6
  doc_id token_id token   pos   ispron group
  <fct>     <int> <fct>   <fct>  <dbl> <dbl>
1 d1            8 ik      PRON      1.    3.
2 d1            9 heb     AUX       0.    3.
3 d1           10 het     DET       0.    3.
4 d1           11 al      ADV       0.    3.
5 d1           12 gekocht VERB      0.    3.

R subset range of rows below a certain string

Question

1 answers

solution1
0 ACCPTED 2018-06-25 23:53:05

R subset range of rows below a certain string

Question

1 answers

solution1 0 ACCPTED 2018-06-25 23:53:05

solution1
0 ACCPTED 2018-06-25 23:53:05