简体   繁体   中英

R: How to assign a running counter to each unique value in a vector?

I have come across this question. I wanted to identify the day of the second Sunday of each month for the next 100 years. This is my code

x <- seq(as.Date("2014-9-01"),as.Date("2014-9-01")+100*365.25,1)

y <- format(x,"%Y%m")

xx <- NULL
for(i in unique(y)) {
  w <- which(y == i)
  xx <- c(xx,x[w[which(weekdays(x[w]) == "Sunday")[2]]])
}

head(xx)
tail(xx)

I have achieved it but I had to use a loop. How do I do this more efficiently with vectorised code?

In general, suppose there is a vector v with n distinct values, how do I assign an increasing value to each distinct value of v starting with 1 for each distinct value. That is, suppose I start with a vector

v <- c(1,1,1,2,2,2,2,3,4,4)

and I want to generate a "running counter", v.counter , of the unique values in v

v.counter <- c(1,2,3,1,2,3,4,1,1,2)

obviously I can write a loop to do this. But how do I do this with vectorised code instead?

You can do the running count with dplyr :

library(dplyr)

dat = data.frame(x=rep(1:10, each=3))

dat = dat %>%
  group_by(x) %>%
  mutate(x_count=1:n())

    x x_count
1   1       1
2   1       2
3   1       3
4   2       1
5   2       2
6   2       3
...
25  9       1
26  9       2
27  9       3
28 10       1
29 10       2
30 10       3

This should be fairly simple using the ave() function for generative group-specific values.

ave(v, v, FUN=seq_along)
# [1] 1 2 3 1 2 3 4 1 1 2

Should you want to only look at consecutive sequences and not unique values in v you could so something like this as well

v <- c(1,1,1,2,2,2,2,1,2,2)
ave(v, with(rle(v), rep(1:length(lengths), lengths)), FUN=seq_along)
# [1] 1 2 3 1 2 3 4 1 1 2

which gives the same values despite the fact there are only two distinct values used in v . The first solution would have continued counting where the 1's left off the second time they were encountered. Also, if v isn't numeric, you can do

v <- rep(letters[1:4], c(3,4,1,2))
ave(seq_along(v), v, FUN=seq_along)
# [1] 1 2 3 1 2 3 4 1 1 2

to still get numeric values.

Suppose we have a data frame containing v :

data <- data.frame(v = c(1,1,1,2,2,2,2,3,4,4))

Then, using dplyr

library(dplyr)
data %>%
    group_by(v) %>%
    mutate(v.counter = row_number())

There are many good answers. I leave the following to get the 2nd Sunday of each month for next 100 years. I am sure there are better ways of handling date-class object. But this works too.

library(lubridate)
library(dplyr)
library(tidyr)

x <- seq(as.Date("2014-9-01"),as.Date("2014-9-01")+100*365.25,1)
weekday <- wday(x)
foo <- data.frame(x, weekday, stringsAsFactors = FALSE)


ana <- foo %>%
    separate(x, c("year", "month", "date"), sep = "-") %>%
    filter(weekday == 1) %>%
    group_by(year, month) %>%
    filter(row_number() == 2) %>%
    unite(sunday, year, month, date, sep = "-") %>%
    mutate(sunday = as.Date(sunday)) %>% ### If you want date object
    select(sunday) ### If you want just one column

head(ana)
Source: local data frame [6 x 1]
      sunday
1 2014-09-14
2 2014-10-12
3 2014-11-09
4 2014-12-14
5 2015-01-11
6 2015-02-08

Just for the sake of completion I want to add the data.table solution

dt <- data.table(x,y) dt[, wd := weekdays(x)] dt <- dt[, wdidx := seq_along(.I), by = c("y", "wd")][wd == "Sonntag" & wdidx == 2,] head(dt, 20)

"Sonntag" means sunday, the intricate working of weekdays() returning the locale of the weekday

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM