簡體   English   中英

R:在現有 dataframe 上根據多個條件添加數據的列和行

[英]R: Adding columns and rows of data based on multiple conditions on a existing dataframe

我想重組我的土地利用分類 dataframe 並根據 dataframe 的條件添加新的行和列。 我一直在使用 dplyr 來嘗試這個,但是我發現的示例傾向於減少列或行,而不是根據條件增加行數。 我試圖遍歷數據集以添加行,但想知道在 dplry 中是否有更好的方法? 我也願意使用不同的庫,但它是一個非常大的分類數據集,dplyr 似乎與 dataframe 配合得很好?

這是我當前的 dataframe 的代碼示例(df_old)以及我希望它最終的樣子(df_new)。

我想做的是,每次 Year1990-2015 更改它都會創建一個新行。 示例:ID 424,在 1990 年為 51,但在 2000 年更改為 21 並保持 21 直到今天。 這意味着 ID 424 的新 dataframe 應該有兩行。 一個標有 Start_Year 的標簽表示 1990 年土地利用的開始為 Forest (Landuse = 51),並且在 2000 年發生變化之前一直是 Forest。由於 2000 年它是 Pavement,我們假設它在 1999 年仍然是 Forest,而 End_Year 將是 1999 年ID 424 的第一行。然后,ID 424 出現一個新行,其中 Start_Year 為 2000,因為它更改為 Pavement (Landuse = 21),並且在 End_year(今天)之前保持為 21。

為了添加上下文,數據集表示城市中區域的變化情況,其中 1990-2015 年的數字用於識別不同的土地利用分類(21 = 路面,24 = 公園,25 = 住宅,51 = 森林,41 = 農業) .

df_old <- data.frame(ID = c(424,426,427,428),
             Parameter= c(0.01,0.03,0.03,0.01),
             City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
             Area = c(3.12,7.98,2.01,0.48),
             Year1990 = c(51,51,51,41),
             Year2000 = c(21,51,51,41),
             Year2005 = c(21,51,51,25),
             Year2010 = c(21,51,51,24),
             Year2015 = c(21,51,51,25))

df_new <- data.frame(ID = c(424,424,426,427,428,428,428,428),
             Parameter= c(0.01,0.01,0.03,0.03,0.01,0.01,0.01,0.01),
             City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
             Area = c(3.12,3.12,7.98,2.01,0.48,0.48,0.48,0.48),
             Start_Year = c(1990,2000,1990,1990,1990,2005,2010,2015),
             End_Year = c(1999,"present","present","present",2004,2009,2014,"present"),
             Landuse = c("51-51","51-21","51-51","51-51","41-41","41-25","25-24","24-25"))

OG數據

這就是我想要的最終產品:

新的數據框結構

此解決方案適用於您的示例數據,但很難確定管理您所需操作的“規則”(因此很難知道它是否適用於您的真實數據)。 如果您的真實數據失敗,請使用更多信息編輯您的帖子。

library(tidyverse)

df_old <- data.frame(ID = c(424,426,427,428),
                     Parameter= c(0.01,0.03,0.03,0.01),
                     City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
                     Area = c(3.12,7.98,2.01,0.48),
                     Year1990 = c(51,51,51,41),
                     Year2000 = c(21,51,51,41),
                     Year2005 = c(21,51,51,25),
                     Year2010 = c(21,51,51,24),
                     Year2015 = c(21,51,51,25))

df_new <- data.frame(ID = c(424,424,426,427,428,428,428,428),
                     Parameter= c(0.01,0.01,0.03,0.03,0.01,0.01,0.01,0.01),
                     City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
                     Area = c(3.12,3.12,7.98,2.01,0.48,0.48,0.48,0.48),
                     Start = c(1990,2000,1990,1990,1990,2005,2010,2015),
                     End = c(1999,"present","present","present",2004,2009,2014,"present"),
                     LU = c("51-51","51-21","51-51","51-51","41-41","41-25","25-24","24-25"))


df_old %>%
  pivot_longer(cols = -c(1:4)) %>%
  group_by(ID) %>%
  mutate(Start = as.numeric(str_extract(name, "\\d+"))) %>%
  mutate(`LU-LU` = paste(lag(value, default = max(value)), "-", value, sep = "")) %>%
  distinct(`LU-LU`, .keep_all = TRUE) %>%
  group_by(ID) %>%
  filter(value != lag(value, default = 0)) %>%
  group_by(ID) %>%
  mutate(End = lead(Start, default = NA) - 1,
         End = replace_na(End, "present")) %>%
  select(c(ID, Parameter, City, Area, Start, End, `LU-LU`))
#> # A tibble: 8 × 7
#> # Groups:   ID [4]
#>      ID Parameter City        Area Start End     `LU-LU`
#>   <dbl>     <dbl> <chr>      <dbl> <dbl> <chr>   <chr>  
#> 1   424      0.01 Abbotsford  3.12  1990 1999    51-51  
#> 2   424      0.01 Abbotsford  3.12  2000 present 51-21  
#> 3   426      0.03 Abbotsford  7.98  1990 present 51-51  
#> 4   427      0.03 Abbotsford  2.01  1990 present 51-51  
#> 5   428      0.01 Abbotsford  0.48  1990 2004    41-41  
#> 6   428      0.01 Abbotsford  0.48  2005 2009    41-25  
#> 7   428      0.01 Abbotsford  0.48  2010 2014    25-24  
#> 8   428      0.01 Abbotsford  0.48  2015 present 24-25

reprex package (v2.0.1) 於 2021 年 12 月 3 日創建

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM