[英]use mutate to create new variable where column has one variable based on condition within tidy tibble
我正在嘗試創建一個名為 cpi2000 的新變量,該變量將 2000 年的 cpi 值用於系列中的所有觀察值(我有四個系列,因此是 group_by),以便我可以計算通貨膨脹調整因子。 但是,以下代碼僅替換 2000 年的值,而將其他年份保留為 NA。 基本上,我希望在 cpi2000 中有四個重復的數字,每個系列一個。
這是我的數據的負責人:
Groups: series_id [1]
year series_id value seasonal_adj series_name cpi2000
<chr> <chr> <dbl> <chr> <chr> <dbl>
1 2000 CPIAUCSL 172. seasonally adjusted US city average, all items, seasonally adjusted 172.
2 2001 CPIAUCSL 177. seasonally adjusted US city average, all items, seasonally adjusted NA
3 2002 CPIAUCSL 180. seasonally adjusted US city average, all items, seasonally adjusted NA
4 2003 CPIAUCSL 184 seasonally adjusted US city average, all items, seasonally adjusted NA
5 2004 CPIAUCSL 189. seasonally adjusted US city average, all items, seasonally adjusted NA
6 2005 CPIAUCSL 195. seasonally adjusted US city average, all items, seasonally adjusted NA
>
cpi_values_tidy_clean <- cpi_values_tidy %>%
separate(date,
into = c("year"),
sep = "-",
extra = "drop") %>% # separate NAM into three variables
group_by(series_id) %>%
mutate(cpi2000 = if_else(year == 2000, value, value[2000])) %>%
glimpse()
這是 output:
[1] 172.192 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 172.200 NA NA NA NA NA NA NA NA NA NA NA NA NA
[36] NA NA NA NA NA NA NA 165.717 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 165.725 NA NA NA NA NA NA
[71] NA NA NA NA NA NA NA NA NA NA NA NA NA NA
我認為最好的方法是使用 if_else 語句(case_when 似乎不起作用)。 如果我能弄清楚如何讓 if_else 語句 ("value[2000]) 中的第二個參數在 year == 2000 時取值,這將起作用,但我不知道如何在第二個聲明。
最終目標是創建兩個變量 cpi2000 和 cpi2019,這樣我就可以創建第三個變量 cpi_adj = (cpi2019/cpi2000),它可以用作通貨膨脹因子。
任何幫助將不勝感激。
我意識到我可以在第二個條件 value[year == 2000] 中指定年份,而不是像我使用 value[2000] 那樣對括號 position 進行子集化。 對 using 2000 進行子集化會產生“NA”,因為沒有第 2000 行,而是我會使用 value[1] 因為我想要第一個值。 或者,按年份過濾更安全,因為它允許我指定我想要的年份。 這是我解決的代碼和下面的 output:
cpi_values_tidy_clean <- cpi_values_tidy %>%
separate(date,
into = c("year"),
sep = "-",
extra = "drop") %>% # separate NAM into three variables
group_by(series_id) %>%
mutate(cpi2000 = if_else(year == 2000, value, value[year == 2000])) %>%
mutate(cpi2019 = if_else(year == 2019, value, value[year == 2019])) %>%
glimpse()
head(cpi_values_tidy_clean)
year series_id value seasonal_adj series_name cpi2000 cpi2019
<chr> <chr> <dbl> <chr> <chr> <dbl> <dbl>
1 2000 CPIAUCSL 172. seasonally adjusted US city average, all items, seasonally adjusted 172. 256.
2 2001 CPIAUCSL 177. seasonally adjusted US city average, all items, seasonally adjusted 172. 256.
3 2002 CPIAUCSL 180. seasonally adjusted US city average, all items, seasonally adjusted 172. 256.
4 2003 CPIAUCSL 184 seasonally adjusted US city average, all items, seasonally adjusted 172. 256.
5 2004 CPIAUCSL 189. seasonally adjusted US city average, all items, seasonally adjusted 172. 256.
6 2005 CPIAUCSL 195. seasonally adjusted US city average, all items, seasonally adjusted 172. 256.
如果有人知道如何更優雅地或使用 case_when 來做到這一點,我很樂意看到它。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.