简体   繁体   中英

R - Streamlined Markov Chain

I have two data sets, annual transition probabilities and initial values. The goal is to use these to develop an idea of what a company will look like in five years.

Initial values are in the form:

|     Age       |      Gender    |     Initial     |
----------------------------------------------------
|  18           | F              |  30             |
|  19           | M              |  35             |
|  20           | F              |  40             |
...             |                |
|  Out          |                |  400            |

where the Initial value contains data regarding future hiring. This figure can be modified as per the needs of the solution but at present it represents the annual number of hirings.

Transition probabilities are of the form

|   Age        |    Gender    |   Hire       |    Terminate    |
----------------------------------------------------------------
|   18         |    F         |   0.025      |    0.3          |     
|   18         |    M         |   0.03       |    0.1          |
|   19         |    F         |   0.01       |    0.4          |
...

That is, 2.5% of all hirings will be female 18 year olds and 30% of all 18 year old women will leave the company.

Using Markov transition probabilities we have

p(Out, 18F) = 0.025
p(18F,Out) = 0.3
p(18F,19F) = 0.7 #The complement action to leaving the company is staying and getting a year older

Assuming no gender changes or time machines, all other transition probabilities would be 0.

Is there a way of simplifying the forecasting process so that I don't need to generate transition matrices doesn't have to be mostly full of zeroes? How would you go about it? (Using or not using the "markovchain" package)

PS: As I write this, I realise that it's one step more efficient to have two tables, one for the men and another for the women and calculating them separately, but that's still not quite where I want it.

Worked it out later: it's simplest as having a Markov chain for each age/gender group which can be simplified to a dataframe.

The initial values can be left_join ed onto the transition probabilities into a data structure d .

d$temp <- lag(d$Initial * d$Terminate)
d$temp[1] <- 0 #Gets rid of NA
d$temp <- d$temp + d$hire*TotHires[1]
#where TotHires[1] represents the number hired in year 1

This gives the results after one year. For n years, we have

d$temp <- d$Initial
for (y in 1:n) {
  d$temp <- lag(d$temp * d$Terminate)
  d$temp[1] <- 0 #Gets rid of NA
  d$temp <- d$temp + d$hire*TotHires[n]
  #where TotHires[n] represents the number hired in year n
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM