dplyr：使用矩阵中的值子集创建带有 case_when 的新列

Question

I am trying to use create a new column in a data frame using mutate and case_when but I get unexpected results.我正在尝试使用mutate和case_when在数据框中创建一个新列，但我得到了意想不到的结果。

Here is a dput of a subset of my data: Pastebin .这是我的数据子集的 dput： Pastebin 。

The aim is to calculate own and cross price elasticities for products in multiple completely separate markets.目的是计算多个完全独立市场中产品的自身和交叉价格弹性。 My idea was to use case_when to use different expressions for own and cross elasticities and use a unique product identifier ( IDprod_un_j and IDprod_un_l ) to subset some values from another matrix.我的想法是使用 case_when 对自身弹性和交叉弹性使用不同的表达式，并使用唯一的产品标识符（ IDprod_un_j和IDprod_un_l ）从另一个矩阵中对某些值进行子集化。 This is the code I am using:这是我正在使用的代码：

elast_small %<>% 
  mutate(
    eta_jlm_rc = case_when(
      IDprod_j == IDprod_l ~ (-price_j/share_j) * rowMeans(-alpha_i_rc * share_i_small[IDprod_un_j,] * (1-share_i_small[IDprod_un_j,])),
      IDprod_j != IDprod_l ~ (-price_l/share_j) * rowMeans(alpha_i_rc * share_i_small[IDprod_un_j,] * share_i_small[IDprod_un_l,])
    )
  )

This runs without errors, but when I try to verify the results, I get different values:这运行没有错误，但是当我尝试验证结果时，我得到了不同的值：

> -elast_small$price_j[1] / elast_small$share_j[1] * mean(-alpha_i_rc * share_i_small[1,] * (1-share_i_small[1,]))
[1] -10.02669
> elast_small$eta_jlm_rc[1]
[1] -14.83231

What am I missing here?我在这里缺少什么？

Answer 1

What I was missing here is that case_when does not apply the RHS row by row, but in one go for each case so that share_i_small[IDprod_un_j,] returns a matrix with more than one row.我在这里缺少的是case_when不逐行应用 RHS，而是一次性应用每种情况，以便share_i_small[IDprod_un_j,]返回一个多于一行的矩阵。 Multiplying a vector and a matrix is done columnwise in R, so the multiplication is not correct.将向量和矩阵相乘是在 R 中按列完成的，因此乘法是不正确的。

This solves the issue:这解决了这个问题：

elast %<>%
  mutate(
    eta_jlm_rc = case_when(
      IDprod_j == IDprod_l ~ (-price_j/share_j) * rowMeans(t(t(share_i[IDprod_ud_j,] * (1-share_i[IDprod_ud_j,])) * -alpha_i_rc)),
      IDprod_j != IDprod_l ~ (-price_l/share_j) * rowMeans(t(t(share_i[IDprod_ud_j,] * share_i[IDprod_ud_l,]) * alpha_i_rc))
    )
  )

Answer 2

It looks like it might work if you group by product type j and l and then make the variables by which you're multiplying (-price/share) before hand in the mutate() statement:如果您按产品类型j和l分组，然后在mutate()语句中(-price/share)要乘以(-price/share)的变量，它看起来可能会起作用：

tmp <- elast_small %>% 
  group_by(IDprod_un_j,IDprod_un_l) %>% 
  mutate(
    newvar1 = mean(-alpha_i_rc * share_i_small[IDprod_un_j, ] * (1-share_i_small[IDprod_un_j, ])), 
    newvar2 = mean(alpha_i_rc * share_i_small[IDprod_un_j, ] * share_i_small[IDprod_un_l, ]), 
    eta_jlm_rc = case_when(
      IDprod_j == IDprod_l ~ (-price_j/share_j) * newvar1,
      IDprod_j != IDprod_l ~ (-price_l/share_j) * newvar2
    )
  )

tmp %>% 
  select(IDprod_un_j, IDprod_un_l, eta_jlm_rc2) %>% 
  as.data.frame %>% 
  head
# IDprod_un_j IDprod_un_l   eta_jlm_rc2
# 1           1           1 -10.026692702
# 2           1           2   0.001446025
# 3           1           3   0.005316131
# 4           1           4   0.133027210
# 5           1           5   0.017306581
# 6           1           6   0.063833755

dplyr：使用矩阵中的值子集创建带有 case_when 的新列

问题描述

2 个解决方案

解决方案1
0 2020-09-16 12:28:12

解决方案2
0 2020-09-16 12:42:37

dplyr：使用矩阵中的值子集创建带有 case_when 的新列

问题描述

2 个解决方案

解决方案1 0 2020-09-16 12:28:12

解决方案2 0 2020-09-16 12:42:37

解决方案1
0 2020-09-16 12:28:12

解决方案2
0 2020-09-16 12:42:37