简体   繁体   中英

How to select 2nd and 3rd row for each group in R

I need to select 2nd and 3rd entry for each group in a dataframe. I have been trying but getting an error.

Sample Data:

USER.ID   restaurant
3            aaaa
3            ababa
3            asddw
4            bbbb
4            wedwe
2            ewedw
1            qwqw
1            dwqd
1            dqed
1            ewewq

Desired output:

USER.ID    2nd_restaurant   3rd_restaurant
3            ababa             asddw
3            ababa             asddw
3            ababa             asddw
4            wedwe             NA
4            wedwe             NA
2            NA                NA
1            dwqd              dqed
1            dwqd              dqed
1            dwqd              dqed
1            dwqd              dqed

I tried using dplyr, but I guess due to huge size of data, it is taking a long time to compute. Is there a way to compute it more efficiently?

My code:

data1 <- data %>%
arrange(USER.ID) %>%
group_by(USER.ID) %>%
mutate(second_restaurant = data[2,11]) %>%
mutate(third_restaurant = data[3,11])

11 is the column number of restaurant in original data set.

Copy the restaurant column first, and then use mutate to extract the relevant values:

mydf %>%
  mutate(restaurant2 = restaurant) %>%
  group_by(USER.ID) %>%
  mutate(restaurant = restaurant[2], restaurant2 = restaurant2[3])
# Source: local data frame [10 x 3]
# Groups: USER.ID
# 
#    USER.ID restaurant restaurant2
# 1        3      ababa       asddw
# 2        3      ababa       asddw
# 3        3      ababa       asddw
# 4        4      wedwe          NA
# 5        4      wedwe          NA
# 6        2         NA          NA
# 7        1       dwqd        dqed
# 8        1       dwqd        dqed
# 9        1       dwqd        dqed
# 10       1       dwqd        dqed

Or, better yet (courtesy @StevenBeaupré):

mydf %>% 
  group_by(USER.ID) %>% 
  transmute(restaurant2 = nth(restaurant, 2), 
            restaurant3 = nth(restaurant, 3))

Or, if you prefer "data.table", to paraphrase @DavidArenburg, you can try:

library(data.table)
as.data.table(mydf)[, `:=`(restaurant_2 = restaurant[2L], 
                           restaurant_3 = restaurant[3L]), by = USER.ID][]

Or, you can even use base R:

mydf[c("restaurant_2", "restaurant_3")] <- with(mydf, lapply(c(2, 3), function(x) {
  ave(restaurant, USER.ID, FUN = function(y) y[x])
}))

如果您的数据框的行名称中有一个简单的顺序,那么使用模运算符也可能是一种方法(下面选择每一行,将2更改为n以选择每一行):

mydf %>% filter(as.numeric(row.names(.)) %% 2 == 0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM