Ticket #52 (closed enhancement: fixed)
Automatic sample-size step for simulations, manual and sources
| Reported by: | ivan.zapreev | Owned by: | ivan.zapreev |
|---|---|---|---|
| Priority: | minor | Milestone: | MRMC v.1.4.1 |
| Component: | Simulation Engine | Version: | 1.4 |
| Keywords: | simulations sample size step | Cc: |
Description
It seems to me that you do not really postpone recalculation of the
conf.int. as written in the manual but rather increase the sample size step. Moreover, the formula in the manual and the one implemented are different, here is some analysis.
From the source code I see that if (simulation_ctmc.c lines 899 --910, in the SVN version (trunk) ):
S - the number of states visited during the BSCC simulation so far
B - the number of simulated BSCCs (non-trivial ones in the impl.)
Z - the number of regeneration cycles done in simulating one BSCC so far
N - the number of states in simulated BSCCs
then you compute the next sample-size step M as:
(*) M = (N / ( S / ( B * Z ) ) ) + Z
here ( S / ( B * Z ) ) is basically the average length of the
regeneration cycle over all BSCCs and all regeneration cycles
we've done. Therefore, (N / ( S / ( B * Z ) ) ) is the number of
regeneration cycles needed on overage to go through all of the BSCC
states (in all BSCCs). Then we have (after some trivial transformation)
M_{n-1} = N * B * Z_{n-1} / S_{n-1} + Z_{n-1} )
with that indexes {n} and {n-1} are added to represent the n'th and
n-1'th epoch in computations. Clearly N and B are constant.
Notice that
Z_{n} = Z_{n-1} + M_{n-1} == 2 * Z_{n-1} + N * B * Z_{n-1} / S_{n-1}
and for all n >= MAX1 for a sufficiently large MAX1
Z_{n-1} / S_{n-1}
goes to some value which is constant value < 1.0. Because Z_{n-1} <=
S_{n-1} and since we do not consider trivial BSCCs we will mostly have Z_{n-1} < S_{n-1}. Also the value should stabilize around some constant value because we use the same regeneration points and etc.
This means that for larger n => MAX1 we have
Z_{n} = 2 * Z_{n-1} + K
Where K <= N * B - and is constant. So the sample size grows in
progression. Sure enough, eventually after some n >= MAX2, the addition of K will be negligible (because Z_{n} will be much larger), so you will have something like ( MAX3 := maximum(MAX1,MAX2) )
Z_{n} = power(2, n-MAX3) * Z_{ MAX3 }
So, in essence, you will have to do more and more simulations and, from some point on, the time between conf. int. checks will grow
exponentially.
The Figure C.2 does not show what happens after x > 7 and I
would really prefer to see the number of visited states, along with the run-time results. Of course, at some Markov chains the heuristic decreases the runtime, but the sample sizes will grow exponentially.
By the way, the formula in the manual is different, i.e. it is
M_{n} = (N / B) / ( ( S_{n-1} / ( B * Z_{n-1} ) ) / B ) + M_{n-1}
when transformed it is equivalent to
M_{n} = M_{n-1} + N * B * Z_{n-1} / S_{n-1}
from which we get
Z_{n} = Z_{n-1} + M_{n-1} + N * B * Z_{n-1} / S_{n-1}
as before, after some sufficiently large n:
N * B * Z_{n-1} / S_{n-1}
will be more or less constant and then the sample-size will
Z_{n} = Z_{n-1} + M_{n}
M_{n} = M_{n-1} + K
So M_{n} and therefore Z_{n} will grow linearly in n.
We will have a linear growth, roughly like:
Z_{n} = Z_{n-1} + n * K
Which is "safer" than the one that is implemented (the exponential one).

