I'd like to do sample_n()
in dplyr, except I don't want the sampling to be random, I intend to sample every n rows.
Is there a way to do this?
For example, I want to get every 10th row of the airquality
dataset after ordering by Month
and Day
. Expected output:
Ozone Solar.R Wind Temp Month Day
NA 194 8.6 69 5 10
11 44 9.7 62 5 20
115 223 5.7 79 5 30
71 291 13.8 90 6 9
12 120 11.5 73 6 19
NA 31 14.9 77 6 29
...
You would really want to be subsetting using a sequence.
mtcars[seq(1, nrow(mtcars), 10), ]
Replace both mtcars
occurances with your data.frame, And replace 10
with the nth row you want to extract
If you have a dataframe of ordered data that you'd like to sample, you can filter on row_number
:
library(tidyverse)
airquality %>%
arrange(Month, Day) %>%
filter(row_number() %% 10 == 0) %>%
head()
#> Ozone Solar.R Wind Temp Month Day
#> 1 NA 194 8.6 69 5 10
#> 2 11 44 9.7 62 5 20
#> 3 115 223 5.7 79 5 30
#> 4 71 291 13.8 90 6 9
#> 5 12 120 11.5 73 6 19
#> 6 NA 31 14.9 77 6 29
Since each month is not grouped, the each 10th row is retained (which means that the Day
goes from 10s to 9s). Grouping by Month
gets around this:
airquality %>%
arrange(Month, Day) %>%
group_by(Month) %>%
filter(row_number() %% 10 == 0) %>%
head()
#> # A tibble: 6 x 6
#> # Groups: Month [2]
#> Ozone Solar.R Wind Temp Month Day
#> <int> <int> <dbl> <int> <int> <int>
#> 1 NA 194 8.60 69 5 10
#> 2 11 44 9.70 62 5 20
#> 3 115 223 5.70 79 5 30
#> 4 39 323 11.5 87 6 10
#> 5 13 137 10.3 76 6 20
#> 6 NA 138 8.00 83 6 30
Of course, we could have just used filter(Day %% 10 == 0)
, but one doesn't always have such nice numbers to work with!
If you intend to sample every 'n' rows in a data.frame and get 'n1' rows within each 'n', create a grouping variable for every 'n' rows and use sample_n
. (That is what I interpreted. Feel free to correct me)
library(dplyr)
n <- 6
n1 <- 3
df1 %>%
group_by(gr= as.numeric(gl(n(), n, n()))) %>%
sample_n(.,n1)
set.seed(24)
df1 <- as.data.frame(cbind(rn=1:40, matrix(sample(0:10,3*40,
replace=TRUE), ncol=3)) )
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.