简体   繁体   中英

Subset first n occurrences of certain value in dataframe

Suppose I have a matrix (or dataframe):

1  5  8
3  4  9
3  9  6
6  9  3
3  1  2
4  7  2
3  8  6
3  2  7

I would like to select only the first three rows that have "3" as their first entry, as follows:

3  4  9
3  9  6
3  1  2

It is clear to me how to pull out all rows that begin with "3" and it is clear how to pull out just the first row that begins with "3."

But in general, how can I extract the first n rows that begin with "3"?

Furthermore, how can I select just the 3rd and 4th appearances, as follows:

3  1  2
3  8  6

Without the need for an extra package:

mydf[mydf$V1==3,][1:3,]

results in:

  V1 V2 V3
2  3  4  9
3  3  9  6
5  3  1  2

When you need the third and fourth row:

mydf[mydf$V1==3,][3:4,]
# or:
mydf[mydf$V1==3,][c(3,4),]

Used data:

mydf <- structure(list(V1 = c(1L, 3L, 3L, 6L, 3L, 4L, 3L, 3L), 
                       V2 = c(5L, 4L, 9L, 9L, 1L, 7L, 8L, 2L), 
                       V3 = c(8L, 9L, 6L, 3L, 2L, 2L, 6L, 7L)), 
                  .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -8L))

Bonus material: besides dplyr , you can do this also very efficiently with data.table ( see this answer for speed comparisons on large datasets for the different data.table methods):

setDT(mydf)[V1==3, head(.SD,3)]
# or:
setDT(mydf)[V1==3, .SD[1:3]]

You can do something like this with dplyr to extract first three rows of each unique value of that column:

library(dplyr)
df %>% arrange(columnName) %>% group_by(columnName) %>% slice(1:3)

If you want to extract only three rows when the value of that column, you can try:

df %>% filter(columnName == 3) %>% slice(1:3)

If you want specific rows, you can supply to slice as c(3, 4) , for example.

We could also use subset

head(subset(mydf, V1==3),3)

Update

If we need to extract also one row below the rows where V1==3 ,

i1 <- with(mydf, V1==3)
mydf[sort(unique(c(which(i1),pmin(which(i1)+1L, nrow(mydf))))),]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM