[英]Create column based on ordering in another column in R
我有一個 dataframe 這是一個更長的版本:
council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
df.desired <- as.data.frame(c(council_name, as.yearqtr(quarter), treat, df, first.treatment))
我想要的是當“治療”第一次為“council_name”的每個值時,值為“四分之一”的列。 如果對於特定的委員會名稱,“治療”從不為 1,則為“0”。
這會是這樣的:
library(zoo)
council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
first.treatment <- c("2006 Q1", "2006 Q3", 0)
df.desired <- as.data.frame <- c(council_name, as.yearqtr(quarter), treat, df, first.treatment)
我用 group_by 和排序嘗試了不同的東西,但我從來沒有完全得到我想要的東西。
我嘗試的一個例子是:
merged2%>%
group_by(council_name, year_qtr)%>%
arrange(year_qtr)%>%
mutate(first.treatment = by(year_qtr, head, 1))
但得到:
Error: Problem with `mutate()` input `first.treatment`. x unique() applies only to vectors ℹ Input `first.treatment` is `by(year_qtr, head, 1)`. ℹ The error occured in group 1: council_name = "Adur", year_qtr = 2006 Q2.
非常感謝!
我確實對示例數據進行了一些調整,但我非常希望,這就是你的意思。 我不喜歡返回字符串或0
的想法。 應該始終返回相同的數據類型。 這就是為什么我的 answern 返回quarter
或NA
的原因。 如果您堅持返回可以使用is.na
輕松“修復”的0
。
council_name <- c("Southwark", "Southwark", "Southwark", "Lambeth", "Lambeth", "Lambeth", "Yorkshire", "Yorkshire", "Yorkshire")
quarter <- c("2006 Q1", "2006 Q2", "2006 Q3", "2006 Q1", "2006 Q2", "2006 Q3","2006 Q1", "2006 Q2", "2006 Q3")
treat <- c(1, 0, 1, 0, 0, 1, 0, 0, 0)
df <- data.frame(council_name, quarter, treat)
treat.one <- function(d){
line <- which(d$treat == 1)[1]
return(d$quarter[line])
}
by(df, council_name, treat.one)
這需要
council_name quarter treat
1 Southwark 2006 Q1 1
2 Southwark 2006 Q2 0
3 Southwark 2006 Q3 1
4 Lambeth 2006 Q1 0
5 Lambeth 2006 Q2 0
6 Lambeth 2006 Q3 1
7 Yorkshire 2006 Q1 0
8 Yorkshire 2006 Q2 0
9 Yorkshire 2006 Q3 0
並返回
> by(df, council_name, treat.one)
council_name: Lambeth
[1] "2006 Q3"
-----------------------------------------
council_name: Southwark
[1] "2006 Q1"
-----------------------------------------
council_name: Yorkshire
[1] NA
使用group_by
時, mutate
調用將依次考慮所有組中的每個變量。
因此,您可以編寫如下內容:
tibble(council_name, year_qtr=as.yearqtr(quarter), treat) %>%
group_by(council_name) %>%
arrange(year_qtr) %>%
mutate(first_treatment = year_qtr[treat==1][1]) %>%
arrange(council_name, year_qtr)
或者
tibble(council_name, year_qtr=as.yearqtr(quarter), treat) %>%
group_by(council_name) %>%
arrange(year_qtr) %>%
summarise(first_treatment = year_qtr[treat==1][1])
對於每個組,這都要求提供treat==1
的year_qtr
列,並獲取結果向量的第一個值。 這就是為什么事先排序( arrange
)很重要。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.