根據行值從數據框中的不同列中提取數據

Question

我想從數據框中的每一行 df 中提取列中的值，如下所述並創建一個新的數據框輸出。

當Year 等於2003 年時，我需要Y_2001 和Y_2002 列中的值，在輸出數據框中作為Year 1 和Year 2。它們是對應於Year 列中指定年份之前兩年的值。 同樣，如果年份等於 2006，我需要輸出數據框中 Y_2004 和 Y_2005 中的值。 同樣，對於 Year 列中的所有年份。

> df
     ID Year Y_2001 Y_2002 Y_2003 Y_2004 Y_2005
[1,]  1 2003      2      4      6      4      3
[2,]  2 2004      5      9      7      1      2
[3,]  3 2006      4      3      5      7      8
[4,]  4 2004      7      6      4      8      9

> output
     ID Year Year1 Year2
[1,]  1 2003     2     4
[2,]  2 2004     9     7
[3,]  3 2006     7     8
[4,]  4 2004     6     4

有人可以幫我創建一個代碼來獲得上面的輸出嗎？ 非常感謝任何支持。

Answer 1

這是一個tidyverse解決方案：

將獲取數據並使用pivot_longer放入長格式。 感興趣的數據值是年“行”比“列”年少 1 或 2 年的地方。 您可以filter這些差異（這里的filter是明確的 1 年或 2 年差異）。

使用mutate為Year1和Year2列名稱創建了一個額外的列（注意Year1是 2 年的差異，而Year2是 1 年的差異，因此該反轉的值從 3 中減去）。 最后， pivot_wider將數據放回寬格式。

library(tidyverse)

df %>%
  pivot_longer(cols = -c(ID, Year), names_to = c(".value", "Year_Sep"), names_sep = "_", names_ptypes = list(Year_Sep = numeric())) %>%
  filter(Year - Year_Sep == 1 | Year - Year_Sep == 2) %>%
  mutate(YearCol = paste0("Year", 3 - (Year - Year_Sep))) %>%
  pivot_wider(id_cols = c(ID, Year), names_from = YearCol, values_from = Y)

輸出

# A tibble: 4 x 4
     ID  Year Year1 Year2
  <int> <int> <int> <int>
1     1  2003     2     4
2     2  2004     9     7
3     3  2006     7     8
4     4  2004     6     4

Answer 2

有點笨拙的解決方案，但是......

i.col <- function(data, n) { # Returns the column index corresponding to the year
  sapply(data$Year-n, function(x) grep(x, names(data)))
}

df$Year1 <- diag(as.matrix(df[, i.col(df, n=2)]))
df$Year2 <- diag(as.matrix(df[, i.col(df, n=1)]))

編輯：顯然使用diag很慢。 首選使用cbind訪問矩陣元素。

df$Year1 <- df[cbind(1:4, i.col(df, n=2))] # where 4 is number of rows
df$Year2 <- df[cbind(1:4, i.col(df, n=1))]

df
  ID Year Y_2001 Y_2002 Y_2003 Y_2004 Y_2005 Year1 Year2
1  1 2003      2      4      6      4      3     2     4
2  2 2004      5      9      7      1      2     9     7
3  3 2006      4      3      5      7      8     7     8
4  4 2004      7      6      4      8      9     6     4

Answer 3

這是一種按行apply方法，假設您可以找到起始年份（ 2001 ）。

cbind(df[1:2], t(apply(df[-1], 1, function(x) 
               { vals <- x[1] - 2001; x[c(vals:(vals + 1))]})))

#  ID Year 1 2
#1  1 2003 2 4
#2  2 2004 9 7
#3  3 2006 7 8
#4  4 2004 6 4

根據行值從數據框中的不同列中提取數據

問題描述

3 個解決方案

解決方案1
2 2020-03-21 01:18:28

解決方案2
1 2020-03-21 01:50:45

解決方案3
0 2020-03-21 05:46:11

根據行值從數據框中的不同列中提取數據

問題描述

3 個解決方案

解決方案1 2 2020-03-21 01:18:28

解決方案2 1 2020-03-21 01:50:45

解決方案3 0 2020-03-21 05:46:11

解決方案1
2 2020-03-21 01:18:28

解決方案2
1 2020-03-21 01:50:45

解決方案3
0 2020-03-21 05:46:11