I have a dataframe which is a much longer version of this:
council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
df.desired <- as.data.frame(c(council_name, as.yearqtr(quarter), treat, df, first.treatment))
What I want is a column with the value of "quarter" when "treatment" is 1 for the first time for each value of "council_name". And is "0" if "treatment" is never 1 for a specific council_name.
This would like something like this:
library(zoo)
council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
first.treatment <- c("2006 Q1", "2006 Q3", 0)
df.desired <- as.data.frame <- c(council_name, as.yearqtr(quarter), treat, df, first.treatment)
I tried different things with group_by and sorting but I never quite get what I am looking for.
An example of what I tried is:
merged2%>%
group_by(council_name, year_qtr)%>%
arrange(year_qtr)%>%
mutate(first.treatment = by(year_qtr, head, 1))
but got:
Error: Problem with `mutate()` input `first.treatment`. x unique() applies only to vectors ℹ Input `first.treatment` is `by(year_qtr, head, 1)`. ℹ The error occured in group 1: council_name = "Adur", year_qtr = 2006 Q2.
Many thanks!
I had do adapt the example data a bit but I am of goog hope, this is what you meant. I do not like the idea to return either a string or 0
. One should always return the same data type. That is why my answern returns either quarter
or NA
. Should you insist on returning 0
that could be easily "fixed" using is.na
.
council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
df <- data.frame(council_name, quarter, treat)
treat.one <- function(d){
line <- which(d$treat == 1)[1]
return(d$quarter[line])
}
by(df, council_name, treat.one)
this takes
council_name quarter treat
1 Southwark 2006 Q1 1
2 Southwark 2006 Q2 0
3 Southwark 2006 Q3 1
4 Lambeth 2006 Q1 0
5 Lambeth 2006 Q2 0
6 Lambeth 2006 Q3 1
7 Yorkshire 2006 Q1 0
8 Yorkshire 2006 Q2 0
9 Yorkshire 2006 Q3 0
and returns
> by(df, council_name, treat.one)
council_name: Lambeth
[1] "2006 Q3"
-----------------------------------------
council_name: Southwark
[1] "2006 Q1"
-----------------------------------------
council_name: Yorkshire
[1] NA
When using group_by
, the mutate
call will consider each variable in all groups successively.
Therefore, you can write something like this:
tibble(council_name, year_qtr=as.yearqtr(quarter), treat) %>%
group_by(council_name) %>%
arrange(year_qtr) %>%
mutate(first_treatment = year_qtr[treat==1][1]) %>%
arrange(council_name, year_qtr)
or
tibble(council_name, year_qtr=as.yearqtr(quarter), treat) %>%
group_by(council_name) %>%
arrange(year_qtr) %>%
summarise(first_treatment = year_qtr[treat==1][1])
For each group, this asks for the year_qtr
column where treat==1
, and takes the first value of the resulting vector. This is why it is important to sort beforehand ( arrange
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.