The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Markov Model Question

I've been developing a specialized control system for one of my clients, using a Markov model to drive the state-transition logic.

I came across an interesting problem this afternoon, though, and wanted to see if anyone else has encountered it.

Like any other Markov model, it's a directed graph whose nodes represent the actual states of the system. The edges represent state transitions, each with a probability indicating the likelihood of traversing that edge to arrive at a new target node. The sum of the probabilities for all outbound links from any node should always equal 1.0.

Here's the tricky part...

I'd like to analyze the whole model (all possible nodes and all possible transitions) and produce another set of probabilities, indicating the probability of the state machine being in ANY STATE at any given point. In this case, the aggregate of all probabilities in the WHOLE model should equal 1.0.

My first instinct was to just run a Monte Carlo simulation for a few million iterations, and then estimate the probabilities from the empirical observations. This methodology won't produce exact results, but if I run a simulation with enough iterations, it'll be accurate enough for all practical purposes.

Of course, such a simulation would be expensive to calculate, so if possible, I'd rather derive the exact result. But I don't know how to do it.

Here's a really simple case, for the sake of illustration:

  A --> A (0.10)
  A --> B (0.85)
  A --> C (0.05)

  B --> A (0.80)
  B --> B (0.15)
  B --> C (0.05)

  C --> A (0.01)
  C --> B (0.01)
  C --> C (0.98)

My intuition tells me that the aggregate probability for each item is the sum of all the inbound probabilities for the item, divided by the sum of all probabilities in the whole graph. So...

  P(A) = (0.10 + 0.80 + 0.01) / 3.0 = 0.3033
  P(B) = (0.85 + 0.15 + 0.01) / 3.0 = 0.3367
  P(C) = (0.05 + 0.05 + 0.98) / 3.0 = 0.3600

According to these numbers, you'd think that node C would only be slightly more common than either A or B. But looking back at the rules above, I can see that node C is MUCH easier to enter than to exit, so I'd expect the results to be much more skewed than the simple sum would imply.

Sure enough, when I run a Monte Carlo simulation (with ten million iterations), I get this set of probabilities:

  P(A) = 0.1380099
  P(B) = 0.1463159
  P(C) = 0.7156742

Obviously, my simple intuitive approach fails to account for the actual hierarchy of conditional probabilities that control navigating through the model.

Does anyone know how to actually calculate these probabilities, short of running the simulation?
BenjiSmith Send private email
Tuesday, November 27, 2007

The problem with my simple case above is that I just added the transition probabilities, as though all of them should have equal weight.

But each transition probability should first be multiplied by the probability of being at the source node. So, the probability calculations should actually be written like this:

P(A) = (
    ( P(A) * 0.10 ) +
    ( P(B) * 0.80 ) +
    ( P(C) * 0.01 )
  ( P(A) + P(B) + P(C) )

P(B) = (
    ( P(A) * 0.85 ) +
    ( P(B) * 0.15 ) +
    ( P(C) * 0.01 )
  ( P(A) + P(B) + P(C) )

P(C) = (
    ( P(A) * 0.01 ) +
    ( P(B) * 0.01 ) +
    ( P(C) * 0.98 )
  ( P(A) + P(B) + P(C) )

It's a big, messy set of mutually-recursive functions. I wonder if there's any way of eliminating the recursion. But converting the recursion into iteration, it doesn't seem like it'll be much more accurate (or less computationally intensive) than the Monte Carlo method.

BenjiSmith Send private email
Tuesday, November 27, 2007
Markov models are also called "Markov chains" because the Markov process has to do with an entire chain of events. You need to make a list of all possible traversals through the network (or all possible traversals from the start states to all reachable nodes) and sum the probabilities of each chain to get the probability of being in each node. It sounds like a lot of work because it is. That Monte Carlo simulation is probably sounding better and better.
Jeffrey Dutky Send private email
Tuesday, November 27, 2007
Good news: I know how to do it.
Bad news: It's not always possible to do it.

So there are certain pathological cases:
A -> A 0
A -> B 1
B -> A 1
B -> B 0
Initial state = A

You will never reach a steady state solution.

But, ignoring such problems, here's what to do:
First, let's write your current probabilities as a vector:
P = (Pa, Pb, Pc)'.  (I'm using ' to denote transpose, i.e., this should be a column vector.)  I'll assume your starting state is (1, 0, 0), but you may choose whatever you want.

Second, write your transition probabilities as a matrix:
        A    B    C
  A  .10  .80  .01
To B  .85  .15  .01
  C  .05  .05  .98

For simplicity, I'll call the matrix M.  Let P(t) be your probability vector at time t.  Then:
    P(t + 1) = M * P(t)
where * is the usual matrix multiplication.

You should probably convince yourself at this point that what we've done so far is correct.

So it's not 100% clear to me if you want a specific time point t, or a steady state time point sst.  I think the latter, but I'll put it out there.

Method 1 (the former):
    P(t) = M^(t) * P(0)
You could do a bunch of repeated matrix multiplications (e.g., if t = 8, you'd do M2 = M*M, M4 = M2*M2, M8 = M4*M4, if t = 10, you'd use M2*M8, etc.)  In general you probably have written code to do fast exponentiation without matrices; the matrix version is similar.

Method 2 (the latter):
You want P(steady state), such that
    P(steady state time + 1) = P(steady state time)
Plugging in:
    P(sst + 1) = M * P(sst)
    P(sst) = M * P(sst)
    0 = (M - I)*P(sst)
where I is the identity matrix, and 0 is the 0 vector.

So you need a new matrix N which is M - I (i.e., subtract 1 from all the diagonal entries, all other entries are the same).  You need to find a non-zero eigenvector of N with eigenvalue 0.  (0 must be an eigenvalue because the sum of transition probabilities out is 1, and 1-1 = 0.)  Getting the eigenvector is hard to explain in general terms, so I'll work it out with your data, and hope the pattern is clear.

For now, let the first element of the eigenvector be 1.  Using your data, we need to solve the matrix equation:
 -.90  .80  .01    1    0
  .85 -.85  .01 *  x  =  0
  .05  .05 -.02    y  =  0
I'm going to drop the last row, since the matrix is singular (it has eigenvalue 0 as discussed above) and multiply the 1 column and move it to the other side:
      .80  .01    x    .90
      -.85  .01 *  y  =  -.85
Let's call this matrix on the left N2, and the vector on the right V.  To get x and y, you do inverse(N2)*V.  (Or you could do Gaussian elimination, which might be faster than calculating the inverse).  This works out to (x, y) = (1.06, 5.15)

So the eigenvector you want is (1, 1.06, 5.15).  Now we just normalize that by calculating the sum and dividing each term by it.  This gives (.139, .14, .714), which matches your simulation.

Please let me know if I can clarify parts of this; it's kind of tricky to do and even more so to explain.
Michael Gibson Send private email
Tuesday, November 27, 2007
P.S. I assume you're using a discrete time Markov chain (it sounds like you are).  If you're doing continous time, it's slightly different, but the ideas are similar.
Michael Gibson Send private email
Tuesday, November 27, 2007
Thanks, Michael!! That's exactly the kind of thing I'm looking for.

I haven't tried following along at home with my own scratch paper, but at this point, I think I get the basic ideas behind the technique. Sometime tomorrow, I'll break out the pencil and the slide rule, to make sure I can work it out on my own, from start to finish.

If I get stuck, I'll let you know.

Thanks again! I really appreciate the detailed explanation!
BenjiSmith Send private email
Tuesday, November 27, 2007
And, yeah, you were correct in assuming that I wanted to get a probability at a steady-state time.

In cases where the steady-state is impossible to calculate (due to oscillations, like the one you pointed out), I'd be content with a moving average over a handful of cycles. Knowing how to calculate an arbitrary P(t) will let me easily calculate that moving average, if necessary.

Of course, now I'm trying to think of an algorithm for identifying which cases stabilize and which ones oscillate...

Tons of fun!
BenjiSmith Send private email
Tuesday, November 27, 2007
Giving it just a few minutes of thought, it seems like any markov model lacking dead ends or cycles (which are really just a different kind of dead end) will always reach a steady-state eventually.

BenjiSmith Send private email
Tuesday, November 27, 2007
I think the condition is the equation P(S) = M * P(S) has a unique solution or not. This is a linear system and is very well studied. Google for "Gaussian elimination" or "linear algebra".
Wednesday, November 28, 2007
I lost some sleep last night thinking about this question.

First off, P(s) = M * P(s) is itself an eigenvalue problem, namely P(s) is an eigenvector of M with eigenvalue of 1.  I previously went through some hoopla to get an eigenvalue of 0.  (Not that it matters, but in continuous time Markov processes, the key eigenvalue is 0, in discrete it's 1, hence my confusion.)  In terms of how to solve it, nothing has changed.

Further, you can do an eigenvalue decomposition of M, so that M = Sinv * D * S, where S and Sinv are the eigenvector matrix and its inverse (which I think may just be its transpose or something like that), and D is a diagonal matrix, i.e., 0 everywhere except the diagonal.

When you try to calculate P(n) = M^n * P(0), it's easier than I thought:
M^2 = M * M
    = Sinv * D * S * Sinv * D * S
    = Sinv * D * D * S
    = Sinv * D^2 * S
In general, M^n = Sinv * D^n * S
In particular, since D is diagonal, D^n is calculated by just raising each non-zero diagonal entry to the nth power.  In other words, you're doing powers of numbers, not of matrices.

However, note that S, Sinv, and D may contain complex values.  So you may just want to do it the other way I suggested for simplicity.

But at least conceptually, you could do the eigenvalue decomposition and handle it that way.

So let's suppose your eigenvalues are 1, 1/2 and 1/3 and you raise them to the nth power.  You get (1)^n, (1/2)^n and (1/3)^n.  Basically, the 1 sticks around, and the other two vanish to 0 as n -> infinity.

In the cycle example I gave before, the eigenvalues are 1 and -1.  So the second eigenvalue doesn't vanish.  In particular, it goes -1, 1, -1, 1, ... thus giving you a cycle of length 2.

In general, if || eigenvalue || < 1, it will vanish as n goes to infinity.  If you have a cycle of length 3, you'll get the 3 complex roots of 1, if you have a cycle of length 4, you'll get 1, -1, i, -i, etc.
Michael Gibson Send private email
Wednesday, November 28, 2007
> Giving it just a few minutes of thought, it seems like any markov model lacking dead ends or cycles (which are really just a different kind of dead end) will always reach a steady-state eventually

Erm, what does it mean for a finite number of state transitions to not have dead ends or cycles?  Unless I misunderstand what you mean by those, it's not possible.

I wouldn't bother trying to give a qualitative description of the kinds of models that have a steady state solution (or a vector space of them).  As Michael has shown, the easy way to tell is by taking the determinant of M-I; there is a steady state iff det(M-I) = 0.
Brian McKeever Send private email
Wednesday, November 28, 2007
Well, my intuitive understanding could be completely off base. I really need to brush up on my linear algebra.

What I meant is that it seems like the non-steady-state markov models seem to arise from getting stuck in a cycle from which there's no escape. Like...

A -> B -> A


A -> B -> C -> A


A -> A

(where those arrows indicate a 1.0 transition probability)

Models without those kinds of cycles (even if the cycles are hidden among a much more complex set of transitions) seem intuitively like they'd always reach a steady state.

I'm more than willing to be wrong, though :)
BenjiSmith Send private email
Wednesday, November 28, 2007
I must admit, my intuition here is not great.

There are, of course, complicated cycle cases like:
A -> B
A -> C
B -> D
C -> D
D -> A

I suspect that if you used some modification the Floyd/Warshall algorithm to figure out the cycle length from A to A (for all A), and there are only paths with cycle length divisible by X, then you have one of these odd cases.  But I don't know for sure how that would work.  In other words, it seems like if you can get from A to A with length 5, 6, 7, 8, 9, etc. then you're probably not going to have cycle problems.  Presumably if you found two paths that were relatively prime, that would be enough.

But again, this is all relying on my poor intuition.  YMMV.
Michael Gibson Send private email
Thursday, November 29, 2007
Well, I should say that I am more familiar with the linear algebra more than I am with markov models, so I may be off-base here.  But if you take the transition matrix ((0, 1), (1, 0)) - which models what you would call a cycle - it has a steady state solution (.5, .5).  I can't tell if this has meaning in your case; I'm guessing that you are more interested in integral solutions?
Brian McKeever Send private email
Thursday, November 29, 2007

The point before was that even though it has the steady state solution you mentioned, if you start from P(0) = (1, 0), you'll never get to that steady state.  If you start from P(0) = (.5, .5), then you will.  Actually, if you start from anything other than P(0) = (.5, .5), you won't ever get to steady state.  For many Markov chains, you'll converge to steady state regardless of initial conditions.
Michael Gibson Send private email
Thursday, November 29, 2007
Yes, I realized that after I posted.  I'll have to think about that.
Brian McKeever Send private email
Thursday, November 29, 2007
Don't ask me, ask Sheldon Ross. Or at least read his books.

You may want to be careful exactly what you are asking.

Suppose there are two states (say s0 and s1), and the the process jumps between them with P=1.

So it will alwasy be at s0 on step t and at s1 on step t+1. The probability of being at s0 is 0 or 1 if you know the step, but .5 if you don't know the step.
Nutmeg programmer
Wednesday, December 05, 2007

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz