简体   繁体   English

“R:dplyr:如何添加一个将值除以第一组值的列(类似于vlookup)”

[英]“R: dplyr: How to add a column that divides a value by the first group of values (kind of like a vlookup)”

I am trying to analyze my data to compare db_perk by plan. 我正在尝试分析我的数据以按计划比较db_perk。 I want to make a column that takes the db_perk divided by the db_perk of the first plan in the plan column. 我想创建一个列,它将db_perk除以plan列中第一个计划的db_perk。 This way I can see the differences of db_perk depending on plan. 这样我可以根据计划看到db_perk的差异。

I want to take this data called SQL_Table : 我想把这个叫做SQL_Table数据:

   plan   gender marital_status accel_type extension_type inflation iss_age   dur    db_perk
   <chr>  <chr>  <chr>          <chr>        <chr>        <chr>     <chr>    <dbl>   <dbl>
 1 BasicF   F    Married        A.24         E.0          AC3.EC3    40       1      0.20
 2 BasicF   F    Married        A.24         E.0          AC3.EC3    40       2      0.25
 3 BasicF   F    Married        A.24         E.0          AC3.EC3    40       3      0.30
 4 BasicF   F    Married        A.24         E.0          AC3.EC3    40       4      0.40
 5 BasicF   M    Single         A.36         E.24         AC3.EC3    40       1      0.15
 6 GradedF  F    Married        A.24         E.0          AC3.EC3    40       1      0.25
 7 GradedF  F    Married        A.24         E.0          AC3.EC3    40       2      0.30
 8 GradedF  F    Married        A.24         E.0          AC3.EC3    40       3      0.50
 9 GradedF  F    Married        A.24         E.0          AC3.EC3    40       4      0.70
10 GradedF  M    Single         A.36         E.24         AC3.EC3    40       1      0.10

And transform it to this: 并将其转换为:

   plan   gender marital_status accel_type extension_type inflation iss_age   dur    db_perk  db_perk_compare
   <chr>  <chr>  <chr>          <chr>        <chr>        <chr>     <chr>    <dbl>   <dbl>      <dbl>
 1 BasicF   F    Married        A.24         E.0          AC3.EC3    40       1      0.20       1.00
 2 BasicF   F    Married        A.24         E.0          AC3.EC3    40       2      0.25       1.00
 3 BasicF   F    Married        A.24         E.0          AC3.EC3    40       3      0.30       1.00
 4 BasicF   F    Married        A.24         E.0          AC3.EC3    40       4      0.40       1.00
 5 BasicF   M    Single         A.36         E.24         AC3.EC3    40       1      0.15       1.00
 6 GradedF  F    Married        A.24         E.0          AC3.EC3    40       1      0.25       1.25
 7 GradedF  F    Married        A.24         E.0          AC3.EC3    40       2      0.30       1.20
 8 GradedF  F    Married        A.24         E.0          AC3.EC3    40       3      0.50       1.67
 9 GradedF  F    Married        A.24         E.0          AC3.EC3    40       4      0.70       1.75
10 GradedF  M    Single         A.36         E.24         AC3.EC3    40       1      0.10       0.67

As you can see the db_perk_compare column is = "1" when the plan is "BasicF" because the formula is dividing the db_perk by BasicF's db_perk. 正如您所看到的,当计划为“BasicF”时,db_perk_compare列为“1”,因为公式将db_perk除以BasicF的db_perk。 The other columns can also have multiple different values that would effect db_perk. 其他列也可以有多个不同的值来影响db_perk。

I've tried something like this 我尝试过这样的事情

for (i in nrow(SQL_Table)){
      SQL_Table$db_perk_compare[i] <- SQL_Table$db_perk[i]/SQL_Table$db_perk[which(plan == SQL_Table$plan[1],
                                                                                   gender == SQL_Table$gender[i],
                                                                                   marital_status == SQL_Table$marital_status[i],
                                                                                   accel_type == SQL_Table$accel_type[i],
                                                                                   extension_type  == SQL_Table$extension_type [i],
                                                                                   inflation  == SQL_Table$inflation [i],
                                                                                   iss_age    == SQL_Table$iss_age[i],
                                                                                   dur  == SQL_Table$dur[i])]
  }

but get this error: 但得到这个错误:

Error in which(plan == SQL_Table$plan[1], gender == SQL_Table$gender[i],  : 
  unused arguments (accel_type == SQL_Table$accel_type[i], extension_type == SQL_Table$extension_type[i], inflation == SQL_Table$inflation[i], iss_age == SQL_Table$iss_age[i], dur == SQL_Table$dur[i])

With tidyverse , we place all the columns to group in the group_by and then mutate the 'db_perk' by dividing with the first observation of that column 使用tidyverse ,我们将所有列放在group_by组中,然后通过除以该列的first观察值来mutate 'db_perk'

library(tidyverse)
SQL_Table %>%
       arrange(plan != "BasicF")%>%
       group_by(gender, marital_status, accel_type,
                extension_type, inflation, iss_age, dur) %>%
      mutate(db_perk_compare = db_perk/first(db_perk))
# A tibble: 10 x 10
# Groups:   gender, marital_status, accel_type, extension_type, inflation, iss_age, dur [5]
#   plan    gender marital_status accel_type extension_type inflation iss_age   dur db_perk db_perk_compare
#   <chr>   <chr>  <chr>          <chr>      <chr>          <chr>       <int> <int>   <dbl>           <dbl>
# 1 BasicF  F      Married        A.24       E.0            AC3.EC3        40     1    0.2            1    
# 2 BasicF  F      Married        A.24       E.0            AC3.EC3        40     2    0.25           1    
# 3 BasicF  F      Married        A.24       E.0            AC3.EC3        40     3    0.3            1    
# 4 BasicF  F      Married        A.24       E.0            AC3.EC3        40     4    0.4            1    
# 5 BasicF  M      Single         A.36       E.24           AC3.EC3        40     1    0.15           1    
# 6 GradedF F      Married        A.24       E.0            AC3.EC3        40     1    0.25           1.25 
# 7 GradedF F      Married        A.24       E.0            AC3.EC3        40     2    0.3            1.2  
# 8 GradedF F      Married        A.24       E.0            AC3.EC3        40     3    0.5            1.67 
# 9 GradedF F      Married        A.24       E.0            AC3.EC3        40     4    0.7            1.75 
#10 GradedF M      Single         A.36       E.24           AC3.EC3        40     1    0.1            0.667

The idea is the same as akrun's, but instead of giving every column names, we can use group_by_at and exclude plan and db_perk . 这个想法与akrun相同,但我们可以使用group_by_at并排除plandb_perk ,而不是给每个列名。

library(dplyr)
SQL_Table %>%
  group_by_at(names(SQL_Table)[-grep("plan|db_perk", names(SQL_Table))]) %>%
  mutate(db_perk_compare = db_perk/first(db_perk))

# # A tibble: 10 x 10
# # Groups:   gender, marital_status, accel_type, extension_type, inflation, iss_age, dur [5]
# plan    gender marital_status accel_type extension_type inflation iss_age   dur db_perk db_perk_compare
# <chr>   <chr>  <chr>          <chr>      <chr>          <chr>     <chr>   <dbl>   <dbl>           <dbl>
# 1   BasicF  F      Married        A.24       E.0            AC3.EC3   40          1    0.2            1    
# 2   BasicF  F      Married        A.24       E.0            AC3.EC3   40          2    0.25           1    
# 3   BasicF  F      Married        A.24       E.0            AC3.EC3   40          3    0.3            1    
# 4   BasicF  F      Married        A.24       E.0            AC3.EC3   40          4    0.4            1    
# 5   BasicF  M      Single         A.36       E.24           AC3.EC3   40          1    0.15           1    
# 6   GradedF F      Married        A.24       E.0            AC3.EC3   40          1    0.25           1.25 
# 7   GradedF F      Married        A.24       E.0            AC3.EC3   40          2    0.3            1.2  
# 8   GradedF F      Married        A.24       E.0            AC3.EC3   40          3    0.5            1.67 
# 9   GradedF F      Married        A.24       E.0            AC3.EC3   40          4    0.7            1.75 
# 10  GradedF M      Single         A.36       E.24           AC3.EC3   40          1    0.1            0.667

Data: 数据:

dput(SQL_Table)
 structure(list(plan = c("BasicF", "BasicF", "BasicF", "BasicF", 
 "BasicF", "GradedF", "GradedF", "GradedF", "GradedF", "GradedF"
 ), gender = c("F", "F", "F", "F", "M", "F", "F", "F", "F", "M"
 ), marital_status = c("Married", "Married", "Married", "Married", 
 "Single", "Married", "Married", "Married", "Married", "Single"
 ), accel_type = c("A.24", "A.24", "A.24", "A.24", "A.36", "A.24", 
 "A.24", "A.24", "A.24", "A.36"), extension_type = c("E.0", "E.0", 
 "E.0", "E.0", "E.24", "E.0", "E.0", "E.0", "E.0", "E.24"), inflation = c("AC3.EC3", 
 "AC3.EC3", "AC3.EC3", "AC3.EC3", "AC3.EC3", "AC3.EC3", "AC3.EC3", 
 "AC3.EC3", "AC3.EC3", "AC3.EC3"), iss_age = c("40", "40", "40", 
 "40", "40", "40", "40", "40", "40", "40"), dur = c(1, 2, 3, 4, 
 1, 1, 2, 3, 4, 1), db_perk = c(0.2, 0.25, 0.3, 0.4, 0.15, 0.25, 
 0.3, 0.5, 0.7, 0.1)), row.names = c(NA, -10L), class = c("tbl_df", 
 "tbl", "data.frame"))

Created on 2019-06-24 by the reprex package (v0.3.0) reprex包创建于2019-06-24(v0.3.0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM