[英]dplyr: using values subset from matrix to create new column with case_when
I am trying to use create a new column in a data frame using mutate
and case_when
but I get unexpected results.我正在尝试使用
mutate
和case_when
在数据框中创建一个新列,但我得到了意想不到的结果。
Here is a dput of a subset of my data: Pastebin .这是我的数据子集的 dput: Pastebin 。
The aim is to calculate own and cross price elasticities for products in multiple completely separate markets.目的是计算多个完全独立市场中产品的自身和交叉价格弹性。 My idea was to use case_when to use different expressions for own and cross elasticities and use a unique product identifier (
IDprod_un_j
and IDprod_un_l
) to subset some values from another matrix.我的想法是使用 case_when 对自身弹性和交叉弹性使用不同的表达式,并使用唯一的产品标识符(
IDprod_un_j
和IDprod_un_l
)从另一个矩阵中对某些值进行子集化。 This is the code I am using:这是我正在使用的代码:
elast_small %<>%
mutate(
eta_jlm_rc = case_when(
IDprod_j == IDprod_l ~ (-price_j/share_j) * rowMeans(-alpha_i_rc * share_i_small[IDprod_un_j,] * (1-share_i_small[IDprod_un_j,])),
IDprod_j != IDprod_l ~ (-price_l/share_j) * rowMeans(alpha_i_rc * share_i_small[IDprod_un_j,] * share_i_small[IDprod_un_l,])
)
)
This runs without errors, but when I try to verify the results, I get different values:这运行没有错误,但是当我尝试验证结果时,我得到了不同的值:
> -elast_small$price_j[1] / elast_small$share_j[1] * mean(-alpha_i_rc * share_i_small[1,] * (1-share_i_small[1,]))
[1] -10.02669
> elast_small$eta_jlm_rc[1]
[1] -14.83231
What am I missing here?我在这里缺少什么?
What I was missing here is that case_when
does not apply the RHS row by row, but in one go for each case so that share_i_small[IDprod_un_j,]
returns a matrix with more than one row.我在这里缺少的是
case_when
不逐行应用 RHS,而是一次性应用每种情况,以便share_i_small[IDprod_un_j,]
返回一个多于一行的矩阵。 Multiplying a vector and a matrix is done columnwise in R, so the multiplication is not correct.将向量和矩阵相乘是在 R 中按列完成的,因此乘法是不正确的。
This solves the issue:这解决了这个问题:
elast %<>%
mutate(
eta_jlm_rc = case_when(
IDprod_j == IDprod_l ~ (-price_j/share_j) * rowMeans(t(t(share_i[IDprod_ud_j,] * (1-share_i[IDprod_ud_j,])) * -alpha_i_rc)),
IDprod_j != IDprod_l ~ (-price_l/share_j) * rowMeans(t(t(share_i[IDprod_ud_j,] * share_i[IDprod_ud_l,]) * alpha_i_rc))
)
)
It looks like it might work if you group by product type j
and l
and then make the variables by which you're multiplying (-price/share)
before hand in the mutate()
statement:如果您按产品类型
j
和l
分组,然后在mutate()
语句中(-price/share)
要乘以(-price/share)
的变量,它看起来可能会起作用:
tmp <- elast_small %>%
group_by(IDprod_un_j,IDprod_un_l) %>%
mutate(
newvar1 = mean(-alpha_i_rc * share_i_small[IDprod_un_j, ] * (1-share_i_small[IDprod_un_j, ])),
newvar2 = mean(alpha_i_rc * share_i_small[IDprod_un_j, ] * share_i_small[IDprod_un_l, ]),
eta_jlm_rc = case_when(
IDprod_j == IDprod_l ~ (-price_j/share_j) * newvar1,
IDprod_j != IDprod_l ~ (-price_l/share_j) * newvar2
)
)
tmp %>%
select(IDprod_un_j, IDprod_un_l, eta_jlm_rc2) %>%
as.data.frame %>%
head
# IDprod_un_j IDprod_un_l eta_jlm_rc2
# 1 1 1 -10.026692702
# 2 1 2 0.001446025
# 3 1 3 0.005316131
# 4 1 4 0.133027210
# 5 1 5 0.017306581
# 6 1 6 0.063833755
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.