根據 R 中的幾個條件創建新列

Question

我有一個由三列組成的數據框，狀態的唯一值如下“X”“0”“C”“1”“2”“3”“4”“5”。 一開始我不知道如何按每個id分組，根據條件創建幾列，比如一個目標列，如果status是2、3、4、5則為1，否則為0。

month_balance 表示（提取數據的月份為起點，倒數，0為當前月份，-1為上個月，以此類推）

status代表（0：逾期1-29天，1：逾期30-59天，2：逾期60-89天，3：逾期90-119天，4：逾期120-149天，5：逾期或不良150天以上的債務核銷C：當月還清，X：當月無貸款）

df <- data.frame (id  = c("5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805"),
                  month_balance = c("0","-1","-2","-3","-4","-5","-6","-7","-8","-9","-10","-11","-12","-13","-14","-15","0","-1","-2","-3","-4","-5","-6","-7","-8","-9","-10","-11","-12","-13","-14"),
                  status = c("C","C","C","C","C","C","C","C","C","C","C","C","C","1","0","X","C","C","C","C","C","C","C","C","C","C","C","C","1","0","X")
                  )

最后，我想達到如下輸出：

df1 <- data.frame (id  = c("5008804","5008805"),
                  month_begin = c("16","15"),
                  paid_off = c("13","12"),
                  num_of_pastdues = c("2","2"),
                  no_loan = c("1","1"),
                  target = c("0","0"))

Answer 1

不太確定如何為target編碼，因為每個 id 的狀態都出現了 target 0 和 1 多次出現。

以下是我為其他變量構建的方式：

df %>% 
    group_by(id) %>% 
    summarise(
        month_begin=max(abs(as.numeric(month_balance)))+1, 
        paid_off=sum(status=="C"), 
        num_of_pastdues=sum(status %in% 0:5), 
        no_loan=sum(status=="X"))

# A tibble: 2 x 5
  id      month_begin paid_off num_of_pastdues no_loan
  <chr>         <dbl>    <int>           <int>   <int>
1 5008804          16       13               2       1
2 5008805          15       12               2       1

Answer 2

library(tidyverse)

df <- data.frame (id  = c("5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008804","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805","5008805"),
                  month_balance = c("0","-1","-2","-3","-4","-5","-6","-7","-8","-9","-10","-11","-12","-13","-14","-15","0","-1","-2","-3","-4","-5","-6","-7","-8","-9","-10","-11","-12","-13","-14"),
                  status = c("C","C","C","C","C","C","C","C","C","C","C","C","C","1","0","X","C","C","C","C","C","C","C","C","C","C","C","C","1","0","X")
) %>% 
  as_tibble()

df %>%  
  mutate(target = case_when(status %in% c(2, 3, 4, 5) ~ 1, 
                            TRUE ~ 0), 
         paid_off = case_when(status == "C" ~ 1, 
                              TRUE ~ 0), 
         no_loan = case_when(status == "X" ~ 1,
                             TRUE ~ 0)) %>%  
  
  group_by(id) %>%  
  summarise(month_begin = n(), 
            across(c(paid_off, no_loan, target), sum))
#> # A tibble: 2 x 5
#>   id      month_begin paid_off no_loan target
#>   <chr>         <int>    <dbl>   <dbl>  <dbl>
#> 1 5008804          16       13       1      0
#> 2 5008805          15       12       1      0

^{由reprex 包於 2022-06-29 創建 (v2.0.1)}

Answer 3

您可以嘗試使用 dplyr。 首先，您可以創建具有所需條件的變量，然后您可以使用匯總來計算每組滿足條件的次數。

df <- df %>%
  mutate(num_of_pastdues = case_when(
    status %in% c(2,3,4,5) ~ 1,
    TRUE ~ 0
  )) %>%
  mutate(no_loan  = case_when(
    status == "X" ~ 1,
    TRUE ~ 0
  )) %>%
  mutate(paid_off  = case_when(
    status == "C" ~ 1,
    TRUE ~ 0
  )) %>%
  group_by(id) %>% 
  summarise(num_of_pastdues = sum(num_of_pastdues), no_loan = sum(no_loan), paid_off = sum(paid_off))

Answer 4

一個基本的 R 解決方案可以是創建一個自定義函數並將其應用於每個組，即

MyFunction <- function(x){
  month_begin = length(x)
  paid_off = sum(x == 'C')
  num_of_pastdues = sum(x %in% 0:5)
  no_loan = sum(x == 'X')
  target = ifelse(any(x %in% 2:5), 1, 0)
  return(c(month_begin=month_begin, paid_off=paid_off, num_of_pastdues=num_of_pastdues, no_loan=no_loan, target=target))
}

res <- t(sapply(split(df$status, df$id), MyFunction))

             month_begin paid_off num_of_pastdues no_loan target
#    5008804 16          13       2               1       0     
#    5008805 15          12       2               1       0

然后使其成為具有列 id 的數據框，

res_df <- data.frame(res)
res_df$id <- rownames(res_df)
rownames(res_df) <- NULL

res_df

#month_begin paid_off num_of_pastdues no_loan target      id
#1          16       13               2       1      0 5008804
#2          15       12               2       1      0 5008805

根據 R 中的幾個條件創建新列

問題描述

4 個解決方案

解決方案1
0 2022-06-29 08:39:19

解決方案2
0 2022-06-29 08:42:25

解決方案3
0 2022-06-29 08:46:52

解決方案4
0 已采納 2022-06-29 09:13:04

根據 R 中的幾個條件創建新列

問題描述

4 個解決方案

解決方案1 0 2022-06-29 08:39:19

解決方案2 0 2022-06-29 08:42:25

解決方案3 0 2022-06-29 08:46:52

解決方案4 0 已采納 2022-06-29 09:13:04

解決方案1
0 2022-06-29 08:39:19

解決方案2
0 2022-06-29 08:42:25

解決方案3
0 2022-06-29 08:46:52

解決方案4
0 已采納 2022-06-29 09:13:04