R-简化的马尔可夫链

Question

I have two data sets, annual transition probabilities and initial values. 我有两个数据集，年度过渡概率和初始值。 The goal is to use these to develop an idea of what a company will look like in five years. 目的是利用这些知识来构想公司在五年内的样子。

Initial values are in the form: 初始值的形式为：

|     Age       |      Gender    |     Initial     |
----------------------------------------------------
|  18           | F              |  30             |
|  19           | M              |  35             |
|  20           | F              |  40             |
...             |                |
|  Out          |                |  400            |

where the Initial value contains data regarding future hiring. 其中初始值包含有关未来雇用的数据。 This figure can be modified as per the needs of the solution but at present it represents the annual number of hirings. 该数字可以根据解决方案的需求进行修改，但目前代表每年的招聘人数。

Transition probabilities are of the form 转移概率的形式为

|   Age        |    Gender    |   Hire       |    Terminate    |
----------------------------------------------------------------
|   18         |    F         |   0.025      |    0.3          |     
|   18         |    M         |   0.03       |    0.1          |
|   19         |    F         |   0.01       |    0.4          |
...

That is, 2.5% of all hirings will be female 18 year olds and 30% of all 18 year old women will leave the company. 也就是说，所有招聘的2.5％将是18岁的女性，而所有18岁的女性中有30％将离开公司。

Using Markov transition probabilities we have 使用马尔可夫转移概率，我们有

p(Out, 18F) = 0.025
p(18F,Out) = 0.3
p(18F,19F) = 0.7 #The complement action to leaving the company is staying and getting a year older

Assuming no gender changes or time machines, all other transition probabilities would be 0. 假设没有性别变化或时间机器，所有其他过渡概率将为0。

Is there a way of simplifying the forecasting process so that I don't need to generate transition matrices doesn't have to be mostly full of zeroes? 有没有一种方法可以简化预测过程，以使我不需要生成过渡矩阵就不必大都充满零？ How would you go about it? 你会怎么做？ (Using or not using the "markovchain" package) （使用或不使用“ markovchain”软件包）

PS: As I write this, I realise that it's one step more efficient to have two tables, one for the men and another for the women and calculating them separately, but that's still not quite where I want it. PS：当我写这篇文章时，我意识到拥有两张桌子要高效得多，一张桌子供男性使用，另一张供女性使用，分别计算它们，但这仍然不是我想要的。

Answer 1

Worked it out later: it's simplest as having a Markov chain for each age/gender group which can be simplified to a dataframe. 稍后解决：最简单的方法是为每个年龄/性别组都拥有一个马尔可夫链，可以将其简化为一个数据框。

The initial values can be left_join ed onto the transition probabilities into a data structure d . 可以将初始值与过渡概率left_join到数据结构d 。

d$temp <- lag(d$Initial * d$Terminate)
d$temp[1] <- 0 #Gets rid of NA
d$temp <- d$temp + d$hire*TotHires[1]
#where TotHires[1] represents the number hired in year 1

This gives the results after one year. 一年后得出结果。 For n years, we have 对于n年来，我们

d$temp <- d$Initial
for (y in 1:n) {
  d$temp <- lag(d$temp * d$Terminate)
  d$temp[1] <- 0 #Gets rid of NA
  d$temp <- d$temp + d$hire*TotHires[n]
  #where TotHires[n] represents the number hired in year n
}

R-简化的马尔可夫链

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-02-14 00:58:28

R-简化的马尔可夫链

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-02-14 00:58:28

解决方案1
0 已采纳 2018-02-14 00:58:28