简体   繁体   中英

Create column based on ordering in another column in R

I have a dataframe which is a much longer version of this:

council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
df.desired <- as.data.frame(c(council_name, as.yearqtr(quarter), treat, df, first.treatment))

What I want is a column with the value of "quarter" when "treatment" is 1 for the first time for each value of "council_name". And is "0" if "treatment" is never 1 for a specific council_name.

This would like something like this:

library(zoo)
council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
first.treatment <- c("2006 Q1", "2006 Q3", 0)
df.desired <- as.data.frame <- c(council_name, as.yearqtr(quarter), treat, df, first.treatment)

I tried different things with group_by and sorting but I never quite get what I am looking for.

An example of what I tried is:

merged2%>%
  group_by(council_name, year_qtr)%>%
  arrange(year_qtr)%>%
  mutate(first.treatment = by(year_qtr, head, 1))

but got:

Error: Problem with `mutate()` input `first.treatment`. x unique() applies only to vectors ℹ Input `first.treatment` is `by(year_qtr, head, 1)`. ℹ The error occured in group 1: council_name = "Adur", year_qtr = 2006 Q2.

Many thanks!

I had do adapt the example data a bit but I am of goog hope, this is what you meant. I do not like the idea to return either a string or 0 . One should always return the same data type. That is why my answern returns either quarter or NA . Should you insist on returning 0 that could be easily "fixed" using is.na .

council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
df <- data.frame(council_name, quarter, treat)

treat.one <- function(d){
  line <- which(d$treat == 1)[1]
  return(d$quarter[line])
}

by(df, council_name, treat.one)

this takes

  council_name quarter treat
1    Southwark 2006 Q1     1
2    Southwark 2006 Q2     0
3    Southwark 2006 Q3     1
4      Lambeth 2006 Q1     0
5      Lambeth 2006 Q2     0
6      Lambeth 2006 Q3     1
7    Yorkshire 2006 Q1     0
8    Yorkshire 2006 Q2     0
9    Yorkshire 2006 Q3     0

and returns

> by(df, council_name, treat.one)
council_name: Lambeth
[1] "2006 Q3"
----------------------------------------- 
council_name: Southwark
[1] "2006 Q1"
----------------------------------------- 
council_name: Yorkshire
[1] NA

When using group_by , the mutate call will consider each variable in all groups successively.

Therefore, you can write something like this:

tibble(council_name, year_qtr=as.yearqtr(quarter), treat) %>% 
  group_by(council_name) %>% 
  arrange(year_qtr) %>% 
  mutate(first_treatment = year_qtr[treat==1][1]) %>% 
  arrange(council_name, year_qtr)

or

tibble(council_name, year_qtr=as.yearqtr(quarter), treat) %>% 
  group_by(council_name) %>% 
  arrange(year_qtr) %>% 
  summarise(first_treatment = year_qtr[treat==1][1])

For each group, this asks for the year_qtr column where treat==1 , and takes the first value of the resulting vector. This is why it is important to sort beforehand ( arrange ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM