[英]“R: dplyr: How to add a column that divides a value by the first group of values (kind of like a vlookup)”
I am trying to analyze my data to compare db_perk by plan. 我正在尝试分析我的数据以按计划比较db_perk。 I want to make a column that takes the db_perk divided by the db_perk of the first plan in the plan column.
我想创建一个列,它将db_perk除以plan列中第一个计划的db_perk。 This way I can see the differences of db_perk depending on plan.
这样我可以根据计划看到db_perk的差异。
I want to take this data called SQL_Table
: 我想把这个叫做
SQL_Table
数据:
plan gender marital_status accel_type extension_type inflation iss_age dur db_perk
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 BasicF F Married A.24 E.0 AC3.EC3 40 1 0.20
2 BasicF F Married A.24 E.0 AC3.EC3 40 2 0.25
3 BasicF F Married A.24 E.0 AC3.EC3 40 3 0.30
4 BasicF F Married A.24 E.0 AC3.EC3 40 4 0.40
5 BasicF M Single A.36 E.24 AC3.EC3 40 1 0.15
6 GradedF F Married A.24 E.0 AC3.EC3 40 1 0.25
7 GradedF F Married A.24 E.0 AC3.EC3 40 2 0.30
8 GradedF F Married A.24 E.0 AC3.EC3 40 3 0.50
9 GradedF F Married A.24 E.0 AC3.EC3 40 4 0.70
10 GradedF M Single A.36 E.24 AC3.EC3 40 1 0.10
And transform it to this: 并将其转换为:
plan gender marital_status accel_type extension_type inflation iss_age dur db_perk db_perk_compare
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 BasicF F Married A.24 E.0 AC3.EC3 40 1 0.20 1.00
2 BasicF F Married A.24 E.0 AC3.EC3 40 2 0.25 1.00
3 BasicF F Married A.24 E.0 AC3.EC3 40 3 0.30 1.00
4 BasicF F Married A.24 E.0 AC3.EC3 40 4 0.40 1.00
5 BasicF M Single A.36 E.24 AC3.EC3 40 1 0.15 1.00
6 GradedF F Married A.24 E.0 AC3.EC3 40 1 0.25 1.25
7 GradedF F Married A.24 E.0 AC3.EC3 40 2 0.30 1.20
8 GradedF F Married A.24 E.0 AC3.EC3 40 3 0.50 1.67
9 GradedF F Married A.24 E.0 AC3.EC3 40 4 0.70 1.75
10 GradedF M Single A.36 E.24 AC3.EC3 40 1 0.10 0.67
As you can see the db_perk_compare column is = "1" when the plan is "BasicF" because the formula is dividing the db_perk by BasicF's db_perk. 正如您所看到的,当计划为“BasicF”时,db_perk_compare列为“1”,因为公式将db_perk除以BasicF的db_perk。 The other columns can also have multiple different values that would effect db_perk.
其他列也可以有多个不同的值来影响db_perk。
I've tried something like this 我尝试过这样的事情
for (i in nrow(SQL_Table)){
SQL_Table$db_perk_compare[i] <- SQL_Table$db_perk[i]/SQL_Table$db_perk[which(plan == SQL_Table$plan[1],
gender == SQL_Table$gender[i],
marital_status == SQL_Table$marital_status[i],
accel_type == SQL_Table$accel_type[i],
extension_type == SQL_Table$extension_type [i],
inflation == SQL_Table$inflation [i],
iss_age == SQL_Table$iss_age[i],
dur == SQL_Table$dur[i])]
}
but get this error: 但得到这个错误:
Error in which(plan == SQL_Table$plan[1], gender == SQL_Table$gender[i], :
unused arguments (accel_type == SQL_Table$accel_type[i], extension_type == SQL_Table$extension_type[i], inflation == SQL_Table$inflation[i], iss_age == SQL_Table$iss_age[i], dur == SQL_Table$dur[i])
With tidyverse
, we place all the columns to group in the group_by
and then mutate
the 'db_perk' by dividing with the first
observation of that column 使用
tidyverse
,我们将所有列放在group_by
组中,然后通过除以该列的first
观察值来mutate
'db_perk'
library(tidyverse)
SQL_Table %>%
arrange(plan != "BasicF")%>%
group_by(gender, marital_status, accel_type,
extension_type, inflation, iss_age, dur) %>%
mutate(db_perk_compare = db_perk/first(db_perk))
# A tibble: 10 x 10
# Groups: gender, marital_status, accel_type, extension_type, inflation, iss_age, dur [5]
# plan gender marital_status accel_type extension_type inflation iss_age dur db_perk db_perk_compare
# <chr> <chr> <chr> <chr> <chr> <chr> <int> <int> <dbl> <dbl>
# 1 BasicF F Married A.24 E.0 AC3.EC3 40 1 0.2 1
# 2 BasicF F Married A.24 E.0 AC3.EC3 40 2 0.25 1
# 3 BasicF F Married A.24 E.0 AC3.EC3 40 3 0.3 1
# 4 BasicF F Married A.24 E.0 AC3.EC3 40 4 0.4 1
# 5 BasicF M Single A.36 E.24 AC3.EC3 40 1 0.15 1
# 6 GradedF F Married A.24 E.0 AC3.EC3 40 1 0.25 1.25
# 7 GradedF F Married A.24 E.0 AC3.EC3 40 2 0.3 1.2
# 8 GradedF F Married A.24 E.0 AC3.EC3 40 3 0.5 1.67
# 9 GradedF F Married A.24 E.0 AC3.EC3 40 4 0.7 1.75
#10 GradedF M Single A.36 E.24 AC3.EC3 40 1 0.1 0.667
The idea is the same as akrun's, but instead of giving every column names, we can use group_by_at
and exclude plan
and db_perk
. 这个想法与akrun相同,但我们可以使用
group_by_at
并排除plan
和db_perk
,而不是给每个列名。
library(dplyr)
SQL_Table %>%
group_by_at(names(SQL_Table)[-grep("plan|db_perk", names(SQL_Table))]) %>%
mutate(db_perk_compare = db_perk/first(db_perk))
# # A tibble: 10 x 10
# # Groups: gender, marital_status, accel_type, extension_type, inflation, iss_age, dur [5]
# plan gender marital_status accel_type extension_type inflation iss_age dur db_perk db_perk_compare
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
# 1 BasicF F Married A.24 E.0 AC3.EC3 40 1 0.2 1
# 2 BasicF F Married A.24 E.0 AC3.EC3 40 2 0.25 1
# 3 BasicF F Married A.24 E.0 AC3.EC3 40 3 0.3 1
# 4 BasicF F Married A.24 E.0 AC3.EC3 40 4 0.4 1
# 5 BasicF M Single A.36 E.24 AC3.EC3 40 1 0.15 1
# 6 GradedF F Married A.24 E.0 AC3.EC3 40 1 0.25 1.25
# 7 GradedF F Married A.24 E.0 AC3.EC3 40 2 0.3 1.2
# 8 GradedF F Married A.24 E.0 AC3.EC3 40 3 0.5 1.67
# 9 GradedF F Married A.24 E.0 AC3.EC3 40 4 0.7 1.75
# 10 GradedF M Single A.36 E.24 AC3.EC3 40 1 0.1 0.667
Data: 数据:
dput(SQL_Table)
structure(list(plan = c("BasicF", "BasicF", "BasicF", "BasicF",
"BasicF", "GradedF", "GradedF", "GradedF", "GradedF", "GradedF"
), gender = c("F", "F", "F", "F", "M", "F", "F", "F", "F", "M"
), marital_status = c("Married", "Married", "Married", "Married",
"Single", "Married", "Married", "Married", "Married", "Single"
), accel_type = c("A.24", "A.24", "A.24", "A.24", "A.36", "A.24",
"A.24", "A.24", "A.24", "A.36"), extension_type = c("E.0", "E.0",
"E.0", "E.0", "E.24", "E.0", "E.0", "E.0", "E.0", "E.24"), inflation = c("AC3.EC3",
"AC3.EC3", "AC3.EC3", "AC3.EC3", "AC3.EC3", "AC3.EC3", "AC3.EC3",
"AC3.EC3", "AC3.EC3", "AC3.EC3"), iss_age = c("40", "40", "40",
"40", "40", "40", "40", "40", "40", "40"), dur = c(1, 2, 3, 4,
1, 1, 2, 3, 4, 1), db_perk = c(0.2, 0.25, 0.3, 0.4, 0.15, 0.25,
0.3, 0.5, 0.7, 0.1)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
Created on 2019-06-24 by the reprex package (v0.3.0) 由reprex包创建于2019-06-24(v0.3.0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.