简体   繁体   中英

Bin dataframe by row number within groups in R

My data consists of word lists from different texts (the group variable), and I'm trying to bin the dataframe within each group by a certain number of rows (every 2000 rows).

My data look like this:

index   text   word
1       H6     mællte
2       H6     fleiru
...
66265   H6     han
1       DG8    Son
2       DG8    hins
3       DG8    var
...
2001    DG8    faer
2002    DG8    hælga

I would like it to look like this:

index   text   word     bin
1       H6     mællte   1
2       H6     fleiru   1
...
66265   H6     han      33
1       DG8    Son      1
2       DG8    hins     1
3       DG8    var      1
...
2001    DG8    faer     2
2002    DG8    hælga    2

We can use rep with dplyr :

library(dplyr)

df %>%
  group_by(text) %>%
  mutate(bin = rep(1:ceiling(n()/2000), each = 2000, length.out = n()))

length.out = n() makes sure that if n() is not divisible by 2000 , the last "bin" value will repeat only up till the Nth row per group.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM