Stochastic Processes: Continuous Time Markov Chains

Math 606, Spring 2024

Luc Rey-Bellet

University of Massachusetts Amherst

2025-04-18

1 Poisson processes

We turn now to continuous Markov processes X_t where t \in [0,\infty). The simplest such example of such process is the ubiquitous Poisson process.

1.1 Definition of the Poisson process

Counting processes N_t are continuous time stochastic process which counts the number of events occuring up to time t: N_t = \textrm{total number of events occuring up to time } t e.g. the number of accidents reported to an insurance company, the number of earthquakes in California, and so on.
The Poisson process is the simplest counting process, intuitvely characterized by the following properties:
1. The number of events occuring in disjoint time intervals are independent.
2. The rate at which event occur is constant over time.
3. The events occur one at a time.
More formally a Poisson process N_t with rate paratmeter \lambda is a continuous time stochastic process such that
- Independent increments: Given times s_1\le t_1 \le s_2 \le t_2 \cdots \le s_n \le t_n the random variables N_{t_i}-N_{s_i} (that is the number of events occurring in the disjoint time intervals [s_i,t_i]) are independent.
- The rate at which events occur is denoted by \lambda>0 and for any (small) time interval [t, t + \Delta t] we have
  \begin{aligned} P \{ N_{t + \Delta t } = N_t \} &= 1- \lambda \Delta t +o(\Delta t) & \\ P\{N_{t + \Delta t}= N_t +1\} &= \lambda \Delta t +o(\Delta t) & \textrm{ with } \lim_{\Delta t \to 0}\frac{o(\Delta t)}{\Delta t}=0 \\ P\{N_{t + \Delta t}\ge N_t +2\} & = o(\Delta t) & \end{aligned} \tag{1.1}

1.2 Distribution of the Poisson process

Theorem 1.1 (Distribution of the Poisson process) The Poisson process N_t with N_0=0 has the distribution P\{N_t=k\} = e^{-\lambda t}\frac{(\lambda t)^k}{k!} i.e. N_t has Poisson distribution with parameter \lambda t. Morever for s, M_t=N_{t+s}-N_{s} is a Poisson process.

Proof (version 1 using Poisson limit). Pick a large number n and divide the interval [0,t] into n intervals of size \frac{t}{n}. Write
N_t = \sum_{j=1}^n\left( N_{j \frac{t}{n}}- N_{(j-1) \frac{t}{n}} \right) as a sum of n independent random variables. If n is large the probability that any of these random variables is at least 2 is small. Indeed, by a union bound, we have \begin{aligned} P\left\{ N_{j \frac{t}{n}}- N_{(j-1) \frac{t}{n}} \ge 2 \textrm { for some } j \right\} &\le \sum_{j=1}^n P\left\{ N_{j \frac{t}{n}}- N_{(j-1) \frac{t}{n}} \ge 2 \right\} \le n P \left\{ N_{\frac{t}{n}} \ge 2\right\} = t \frac{o( \frac{t}{n})}{\frac{t}{n}} \end{aligned} which goes to 0 as n \to \infty.

Therefore N_t is, approximately, a binomial random variables with success probability \lambda \frac{t}{n}: P(N_t=k) \approx {n \choose k} \left(\frac{\lambda t}{n} \right)^k \left(1-\frac{\lambda t}{n} \right)^{n-k} and as n \to \infty this converges to a Poisson distribution with parameter \lambda t. \quad \blacksquare.

Proof (version 2 using ODEs). Let us derive a system of ODEs for the family P_t(k)=P\left\{ N_t=k \right\}. We have \frac{d}{dt}P_t(k) = \lim_{\Delta t \to 0} \frac{P\left\{ N_{t+\Delta t}=k \right\} - P\left\{ N_t=k \right\} }{\Delta t} Conditioning we find \begin{aligned} P\left\{ N_{t+\Delta t}=k \right\}=& P\left\{ N_{t+\Delta t}=k | N_t=k \right\} P\left\{ X_{t}=k \right\} \\ & + P\left\{ N_{t+\Delta t}=k | N_t=k-1 \right\}P\left\{ X_{t}={k-1} \right\} \\ & + P\left\{ N_{t+\Delta t}=k | N_t\le k-2\right\}P\left\{X_t \le k-2\right\} \\ =& P_t(k) (1 - \lambda \Delta t) + P_t(k-1) \Delta t + o(\Delta t) \end{aligned} and this gives the system of equations \begin{aligned} \frac{d}{dt}P_t(0)&= - \lambda P_t(0) & \quad P_0(0)=1 \\ \frac{d}{dt}P_t(k)&= \lambda P_t(k-1) - \lambda P_t(k) &\quad P_0(k)=0 \end{aligned}

We find P_0(t)= e^{-\lambda t} for k=0. We use an integrating factor and set f_t(k)=e^{\lambda t} P_t(k). We have then f_t(0)=1 and for k >0 \frac{d}{dt}f_t(k)=\lambda f_t(k-1), \quad f_0(k)=0
which we can solve iteratively to find f_t(k) = \frac{(\lambda t)^k}{k!} and thus N_t has distribution P_t(k)= e^{-\lambda t}\frac{(\lambda t)^k}{k!}\,. \blacksquare.

1.3 Poisson process and exponential random variables

Poisson processes and exponential (and gamma) random variables are intimately related. Given a Poisson process we consider the interarrival times T_1, T_2, \cdots: T_1 is time of the occurence of the first event, T_2 is the time elapsed between the first and second event, and so on.

Theorem 1.2 If N_t is a Poisson process with parameter \lambda the interarrival times are independent exponential random variables with paramter \lambda.

Proof. If T_1 >t it means no event has occured up time t and so N_t=0. Therefore P\left\{ T_1 >t \right\} = P\left\{ N_t =0\right\} = e^{-\lambda t} and thus T_1 has an exponential distribution. For T_2 we condition on T_1 P\left\{ T_2 >t \right\} = \int P(T_2>t |T_1 =s) f_{T_1}(s) ds and, using the independence of the increments, P\left\{ T_2>t |T_1 =s\right\} = P\left\{ 0 \textrm{ events in } (s, s+t] | T_1=s \right\} = P\left\{ 0 \textrm{ events in } (s, s+t] \right\} = e^{-\lambda t} from which we conclude that T_2 has exponential distribution and is independent of T_1. This argument can be repeated for T_3 by conditioning on the time of the second event, T_1+T_2, and so on. \quad \blacksquare

Another set of closely related quantities are the arrival times of the n^{th} event S_1, S_2, \cdots which are related to the interarrival times by S_1=T_1, \quad S_2=T_1+T_2, \quad S_3=T_1+T_2+T_3, \cdots By Theorem 1.2 S_n is the sum of n independent exponential RVs and thus S_n a Gamma RV with parameter (n,\lambda) with density f_{S_n}(t) = \lambda e^{-\lambda t}\frac{(\lambda t)^{n-1}}{(n-1)!}\, \quad \textrm{ for }t \ge 0\,. We can actually prove this fact using the Poisson process by noting that, by definition,
N_t \ge n \iff S_n \le t \,, that is if n or more events have occured by time t if and only the n^{th} event has occured prior to or at time t.

So the CDF of S_n is F_{S_n}(t)=P\{ S_n \le t\} = P \{ N_t \ge n\} = \sum_{j=n}^\infty e^{-\lambda t} \frac{(\lambda t)^{j}}{j!} and upon differentiating we find f_{S_n}(t)= - \lambda e^{-\lambda t} \sum_{j=n}^\infty \frac{(\lambda t)^{j}}{j!} + \lambda e^{-\lambda t}\sum_{j=n}^\infty \frac{(\lambda t)^{j-1}}{(j-1)!} = \lambda e^{-\lambda t} \frac{(\lambda t)^{n-1}}{(n-1)!}. \quad \blacksquare

1.4 Poisson process and uniform distribution

Let us start with a special case and assume assume that N_t=1, that is exactly one event has occured in [0,t]. Since the Poisson process has independent increments it seems reasonable the event may have occur with equal probability at any time on [0,t]. Indeed for s\le t we have, using the independence of increments. \begin{aligned} P\{T_1 \le s | N_t=1\} & = \frac{P\{T_1 \le s , N_t=1\}}{P\{N_t=1\}} = \frac{P\{\textrm{1 event in }(0,s] \textrm{ and no event in (s,t]} \}}{P\{N_t=1\}} \\ &= \frac{ \lambda s e^{-\lambda s} e^{-\lambda(t-s)} }{\lambda t e^{-\lambda t}} = \frac{s}{t} \end{aligned} and thus the density of the arrival time T_1 conditioned on N_t=1 is uniform on [0,t]

We study further the properties of the arrival times S_1< S_2 <\cdots of a Poisson process. The following result tells us that they follow a uniform distribution on [0,t].

Theorem 1.3 Given the event \{N_t=n\}, the n arrival times S_1, S_2, \cdots have the same distribution as the order statistics for n independent random variables uniformly distributed on [0,t].

Proof. The conditional density of (S_1, \cdots, S_n) given that N_t=n can be obtained as follows. If S_1=s_1, S_2=s_2, \cdots, S_n=s_n and N_t=n then the intearrival times must satisfy T_1=s_1, T_2=s_2-s_1, \cdots, T_n= s_n-s_{n-1}, T_{n+1}> t-s_n.

By the independence of the interarrival times proved in Theorem 1.2 the conditional density is given by \begin{aligned} f(s_1, s_2, \cdots, s_n|n) &= \frac{f(s_1, s_2, \cdots, s_n, n)}{P\{N_t=n\}} \\ &= \frac{\lambda e^{-\lambda s_1} \lambda e^{-\lambda(s_2-s_2)} \cdots \lambda e^{-\lambda(s_n-s_{n-1})} e^{-\lambda(t-s_n)} }{e^{-\lambda t} \frac{(\lambda t)^n}{n!} } = \frac{n!}{t^n} \end{aligned} which is the joint density of the order statistic of n uniform. \blacksquare

Recall if X_1, \cdots, X_n are IID random variable with joint density f(x_1)\cdots f(x_n) and X^{(1)} \le X^{(2)} \le \cdots X^{(n)} the order statistics, then the joint pdf of X^{(1)}, \cdots ,X^{(n)} is given by g(x_1,\cdots, x_n)) = \left\{ \begin{array}{cl} n! f(x_1) \cdots f(x_n) &\textrm{ if } x_1 \le x_2 \le \cdots x_n \\ 0 & \textrm{ else } \end{array} \right.

1.5 Simulation of Poisson process

The characterization of the Poisson process in terms of exponential random variables suggest immediately a very simple algorithm to simulate N_t.

Simulate independent exponential RVs T_1, T_2, \cdots with parameter \lambda and set N_t=0 for 0\le t <T_1, N_t=1 for T_1\le t <T_1+T_2, and so on.

Code

#| code-fold: true
#| code-summary: "Show the code"


#| label: fig-Poisson
#| fig-cap: "A sample of the Poisson process"

import numpy as np
import matplotlib.pyplot as plt

def poisson_process(rate, T):
    # generate a Poisson process with rate `rate` up to time T
    t = 0
    events = [0]
    while t < T:
        # draw time until next event from an exponential distribution
        dt = np.random.exponential(1/rate)
        t += dt
        if t < T:
            # record time of event
            events.append(t)
    return events

rate=10
T=200
events=np.asarray(poisson_process(rate,T))

plt.figure() 
plt.step(events,np.arange(0,len(events)), label=r'sample of $N_t$' )
plt.plot(events,rate*events, label=r'$E[N_t]=\lambda t$')
plt.legend()
plt.xlabel('t')
plt.ylabel(r'$ N_t $')
plt.show()

1.6 Simulation of a Poisson random variable

It is not easy to simulate directly a Poisson random variable X from its pdf/cdf but we can do it elegantly using its relation with exponential random variable. To do this generate independent exponential random variable until they sum up to 1 (so as to generate X=N_1) and use the relation between exponential and uniform.
\begin{aligned} X =n & \iff \sum_{k=1}^n T_k \le 1 < \sum_{k=1}^{n+1} T_k \iff \sum_{k=1}^n - \frac{1}{\lambda} \ln(U_k) \le 1 < \sum_{k=1}^{n+1} - \frac{1}{\lambda} \ln(U_k) \\ & \iff \ln( \prod_{k=1}^n U_k) \ge - \lambda > \ln( \prod_{k=1}^{n+1} U_k) \iff \prod_{k=1}^n U_k \ge e^{-\lambda} > \prod_{k=1}^{n+1} U_k \end{aligned}
Algorithm to simulate a Poisson random variable with parameter \lambda: Generate random numbers until their product is smaller than e^{-\lambda}.
- Generate random number U_1, U_2, \cdots.
- Set X=n if n+1 = \inf \left\{ j \,:\, \prod_{k=1}^{j} U_j < e^{-\lambda} \right\}

Code

#| code-fold: true
#| code-summary: "Show the code"

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson

def poisson_product_method(lmbda):
    """
    Generate a Poisson-distributed random variable using the product method.

    Parameters:
    lmbda (float): The rate parameter (λ) of the Poisson distribution.

    Returns:
    int: A Poisson-distributed random variable.
    """
    L = np.exp(-lmbda)  # Compute e^(-lambda)
    product = 1.0
    k = 0

    while product > L:
        product *= np.random.uniform(0, 1)
        k += 1

    return k - 1  # Since k is incremented once extra

# Parameters
lmbda = 5  # Mean number of events
num_samples = 5000

# Generate Poisson samples
samples = [poisson_product_method(lmbda) for _ in range(num_samples)]

# Compute theoretical Poisson PMF values
x = np.arange(0, max(samples) + 1)
pmf_values = poisson.pmf(x, lmbda)  # Theoretical Poisson PMF
pmf_scaled = pmf_values * num_samples  # Scale to match histogram counts

# Plot histogram of simulated data
plt.figure(figsize=(8, 5))
plt.hist(samples, bins=x - 0.5, alpha=0.6, color='b', edgecolor='black', label="Simulated Data", density=False)

# Plot theoretical Poisson PMF as a histogram with narrower bars
plt.bar(x, pmf_scaled, color='r', alpha=0.6, edgecolor='black', label="Theoretical Poisson PMF", width=0.4)

# Labels and title
plt.xlabel("Poisson Random Variable")
plt.ylabel("Frequency")
plt.title(f"Histogram of Poisson({lmbda}) Samples vs. Theoretical PMF")
plt.legend()
plt.grid()
plt.show()

1.7 Long time behavior of the Poisson process

We investigate the behavior of N_t for large time. We prove a CLT type result, namely that
\lim_{t \to \infty} \frac{N_t-\lambda t}{\sqrt{\lambda t}} = Z \textrm{ in distribution} where Z is a standard normal RV.
Recall that the characteristic function of a Poisson RV Y with parameter \mu is E[e^{i\alpha Y}]= e^ {\mu (e^{i\alpha} -1)}. Therefore E\left[ e^{i \alpha \frac{N_t - \lambda t}{\sqrt{\lambda t}}} \right] = e^{\lambda t \left( e^{i \frac{\alpha}{\sqrt{\lambda t}}} -i \frac{\alpha}{\sqrt{\lambda t}} -1\right)} Expanding the exponential we have \lambda t \left(e^{i \frac{\alpha}{\sqrt{\lambda t}}} -i \frac{\alpha}{\sqrt{\lambda t}} -1\right) = - \frac{\alpha^2}{2} + O( \frac{1}{\sqrt{\lambda t}}) and thus \lim_{t \to \infty} E\left[ e^{i \alpha \frac{N_t - \lambda t}{\sqrt{\lambda t}}} \right]=e^{-\frac{\alpha^2}{2}}.
The same computation shows also that, for any fixed t, \lim_{\lambda \to \infty} \frac{N_t-\lambda t}{\sqrt{\lambda t}} = Z \textrm{ in distribution} since rescaling the parameter is equivalent to rescaling time.

1.8 Sampling a Poisson process

We can sample or split a Poisson process. Suppose that every event of a Poisson process (independently of the other events) comes into two different types, say type 1 with probability p and type 2 with probability q=1-p.

Theorem 1.4 Suppose N_t is a Poisson process with parameter \lambda and that every event (independently) is either of type 1 with probability 1 or type 2 with probability q=1-p. Then N^{(1)}_t, the number of events of type 1 up to time t, and N^{(2)}_t, the number of events of type 2 up to time t, are independent Poisson process with rate \lambda p and \lambda(1-p).

Proof. We check that N^{(1)}_t satisfy the definition of a Poisson process and the use Theorem 1.1

N^{(1)}_0=0 and N^{(1)}_t has independent increments since N_t has independent increments and events are classified of type 1 and 2 independently of each other.

We have
\begin{aligned} P \left\{ N^{(1)}_{t + \Delta t}= N^{(1)}_t +1 \right\} & = P\{ N_{t + \Delta t}= N_t +1 \textrm{ and the event is of type } 1 \} \\ & + P\{ N_{t + \Delta t} \ge N_t +2 \textrm{ and exactly one event is of type } 1 \} \\ &= \lambda \Delta t \times p + o(\Delta t) \end{aligned}

\begin{aligned} P \left\{ N^{(1)}_{t + \Delta t}= N^{(1)}_t \right\} & = P\{ N_{t + \Delta t}= N_t \} + P\{ N_{t + \Delta t} = N_t + 1 \textrm{ and the event is of type 2} \} \\ & + P\{ N_{t + \Delta t} \ge N_t + 2 \textrm{ and no event of type $1$} \} \\ &= (1- \lambda \Delta t) + \lambda \Delta t (1-p) + o(\Delta t) =1-\lambda \Delta t p + o(\Delta t) \end{aligned} P \left\{ N^{(1)}_{t + \Delta t}\ge N^{(1)}_t +2 \right\} =o(\Delta t)

Finally to show that N^{(1)}_t and N^{(2)}_t are independent we compute their joint PDF by conditioning on the value of N_t and find \begin{aligned} P\left\{N^{(1)}_t =n , N^{(2)}_t =m \right\} &= P\left\{N^{(1)}_t =n , N^{(2)}_t =m | N_t={n+m}\right\} P\left\{N_t={n+m}\right\} \\ &= {n+m \choose n} p^n (1-p)^m e^{-\lambda t} \frac{(\lambda t)^{n+m}}{(n+m)!} \\ &= e^{-\lambda p t} \frac{ (\lambda p t )^n}{n!} e^{-\lambda (1-p) t} \frac{(\lambda(1-p) t)^{m}}{m!} \end{aligned} \blacksquare

1.9 The coupon collecting problem

We revisit the couplon collector but we relax the assumption that all the toys are equally probable. We assume that any box contains toy i with probability p_i. How do we compute now the expected number of boxes needed to collect all the M toys? The argument used earlier does not generalize easily.
We use the following trick or radomizing the time between boxes. Instead of collecting boxes at fixed time interval, we collect them at times which are exponentially distributed with parameter 1. Then the number of boxes collected up to time t a Poisson process N_t with rate \lambda=1 (on average it takes the same time to get a new box). We have now M types of events (getting a box with toy i) and we split the poisson process accordingly. Then by Theorem 1.4 the number of toys of type i collected up to time t, N^{(i)}_t is a Poisson process with rate \lambda p_i=p_i and the Poisson processes N^{(i)}_t are independent.
We now consider the times T^{(i)} = \textrm{ time of the first event for the process } N^{(i)}_t that is the time where the first toy of type i is collected. The times T^{(i)} are independent since the underlying Poisson processes are independent, and are exponential with parameter p_i. Furthermore S = \max_{i} T^{(i)} = \textrm{ time until one toy of each type has been collected}.

By independence we have P\left\{ S \le t \right\} = P\left\{ \max_{i} T^{(i)} \le t\right\} = \prod_{i=1}^M P\left\{ T^{(i)} \le t \right\} = \prod_{i=1}^M (1- e^{-p_it}) Thus E[S]= \int_0^\infty P\{S \ge t\} dt = \int_0^\infty ( 1- \prod_{i=1}^M (1- e^{-p_it}) )\, dt
Finally we relate S to the original question. If X is the number of box needed to collect all the toys then we have S = \sum_{k=1}^X S_k where S_k aree IID exponential with parameter 1. But conditioning it is easy to see that E[S]= E[N]E[S_1] = E[N] and we are done.

1.10 Poisson process with variable rate

We can generalize the Poisson process by making the rate \lambda(t) at which event occur depend on time: a nonhomogeneous Poisson process N_t with rate paratmeter \lambda(t) is a continuous time stochastic process such that
- Independent increments: Given times s_1\le t_1 \le s_2 \le t_2 \cdots \le s_n \le t_n the random variables N_{t_i}-N_{s_i} (that is the number of events occurring in the disjoint time intervals [s_i,t_i]) are independent.
- We have \begin{aligned} P \{ N_{t + \Delta t } = N_t \} &= 1- \lambda(t) \Delta t +o(\Delta t) & \\ P\{N_{t + \Delta t}= N_t +1\} &= \lambda(t) \Delta t +o(\Delta t) & \textrm{ with } \lim_{\Delta t \to 0}\frac{o(\Delta t)}{\Delta t}=0 \\ P\{N_{t + \Delta t}\ge N_t +2\} & = o(\Delta t) & \end{aligned} \tag{1.2}
One way to construct a nonhomgeneous Poisson process is by sampling it in a time-dependent manner. Suppose \lambda(t) is bounded (locally in t), then we pick \lambda > \lambda(t). We consider a Poisson process M_t with constant rate \lambda, and if an event occurs at time t then we decide to keep this event with probability p(t)=\frac{\lambda(t)}{\lambda} and we discard the event with probability 1-p(t). By the same argument we used in the section Sampling a Poisson process we see the number of kept events satisfies the definition of a non-homogeneous Poisson process in Equation 1.2

Let us consider an event for the process M_t which occured in the interval [0,t]. By our analysis of arrival time we know that this event occured a time which is uniformly distributed on the interval [0,t]. Therefore the probability that this event was accepted and contribute to N_t is therefore p_t = \frac{1}{t}\int_0^t \frac{\lambda(s)}{\lambda} \, ds
By repeating then the second part of the arguement in the section Sampling a Poisson process we see that M_t has a Poisson distribution with parameter \lambda t p_t = \int_0^t \lambda(s) \, ds and in particular E[N_t] = \int_0^t \lambda(s) \, ds

1.11 Queueing model with infinitely many servers

Assume that the the flow of customers entering an onine store follows a Poisson process N_t with rate \lambda. The time S spent in the store for a single customer (browsing around, checking out, etc..) is given by tis CDF G(t)=P\{ S \le t\} and we assume that the customers are independent of each other.
To figure out how to allocate ressources one wants to figure out what is number of customers, M_t, which are still in the sytem at time t.
To find the distribution of M_t let us consider one of the customer by time t. If he arrived at time s\le t then he will have left the system at time t with probability G(t-s) and will still be in the system by time t with probability 1-G(t-s). Since the arrival time of that customer is uniform on [0,t] the distribution of M_t is Poisson with mean E[M_t] = \int_0^t (1- G(t-s))ds = \lambda \int_0^t (1 - G(s))ds,.
For large t, we see that E[M_t] \approx \lambda E[S].

1.12 Compound Poisson process

Example: Suppose that the number of claims receieved by an insurance follows a Poisson process. The size of each claim will be different and it natural to assume that claims are independent from each other. If we look at the total claims incurred by the insurance company this leads to a stochastic process called a compound Poisson process.
A stochastic process X_t is called a compound Poisson process if it has the form X_t = \sum_{k=1}^{N_t} Y_k where N_t is a Poisson process and Y_1, Y_2,\cdots are IID random variables which are also independent of N_t.
The process X_t has stationary independent increments. Using that N_t-N_s is a Poisson process X_t-X_s = \sum_{k=N_s}^{N_t} Y_k \textrm{ has the same distribution as } X_{t-s}= \sum_{0}^{N_{t-s}} Y_k

We can compute the MGF of X_t (or its charactersitic function) by conditining on N_t. Suppose m_Y(\alpha)=E[e^{\alpha Y}] is the moment generating function of Y and using the MGF for the Poisson RV we find \begin{aligned} m_{X_t}(\alpha)& =E\left[ e^{\alpha X_t}\right] = E\left[ e^{\alpha \sum_{k=1}^{N_t} Y_k} \right] = \sum_{n=0}^\infty E\left[ e^{\alpha \sum_{k=1}^{N_t} Y_k} | N_t =n \right] P\{n_t=n\} \\ & = \sum_{n=0}^\infty m(\alpha)^n P\{n_t=n\} = e^{\lambda t ( m(\alpha) -1 )} \end{aligned}
We can compute then the mean and variance \begin{aligned} m_{X_t}'(\alpha)&= e^{\lambda t ( m(\alpha) -1 )} \lambda t m'(\alpha) \\ m_{X_t}''(\alpha)&= e^{\lambda t ( m(\alpha) -1 )} (\lambda t m''(\alpha) + \lambda t)^2() m'(\alpha)^2 ) \end{aligned} and thus E[X_t]= \lambda t E[Y] \quad \textrm{ and } Var[X_t]= \lambda t ({\rm Var}(Y) + E[Y]^2]) With a bit more work we could prove a central limit theorem.

1.13 Exercises

Exercise 1.1 Let N_t be a Poisson process with rate \lambda and let 0 < s < t. Compute

P(N_t =n+k|N_s =k)
P(N_s =k|N_t =n+k)
E[N_tN_s]

Exercise 1.2 Robins and Blackbirds make independent visit to my birdfeeder and they are described by independent Poisson processes R_t and B_t with rate \rho and \beta (per minute) respectively.

What is the probability I see four birds within the 5th and the 10th minutes.
What is the expected number of Robbins I will see between the third and fifth minutes given that I saw 3 Robbins in the first two minutes.
What is the probability that the first two birds I see are Robins?
I have seen ten birds in the last hour. What is the probbaility that three of them were balckbirds?
What is the probability that I see exactly three Robins while I am waiting for to see my first blackbird?
Let T denotes the arrival time of the first blackbird. Find the distribution of R_T (i.e. compute P(R_T=k))

Exercise 1.3 (Estimating the number of asymptomatic using Poisson process with variable rates) Suppose people get infected with a disease at a certain rate, a process described by a Poisson process I_t with rate \lambda which is unknown but constant.

Upon being infected there is an incubation time I until the infected individual exhibits symptoms and we have P( I \le t) =G(t) for some known distribution function G(t).

Suppose S_t is the total number of infected individual exhibiting symptoms by time t and A_t is number of infected individual which do not exhibit symptoms. What are the rates for the processes S_t and A_t?
If t is reasonably large one can argue that a poisson processes N_t with variable rate \lambda(t) satisfies N_t \approx E[N_t]= \int_0^t \lambda(s) ds with high probability. (This follows from the fact that a Poisson RV with large parameter concentrates around its mean, see the CLT argument).
Use this fact to estimate the E[A_t] even if the infection rate \lambda is unknown.
Suppose P(I \ge t)=e^{-t/\beta} with \beta=10, and that after 16 years 220 thousand people are infected. What is the estimate for the number of asymptomatic individuals?

Exercise 1.4 (Bulk arrivals) At Spoke on Thursday night groups of customers arrive according to a Poisson process with rate \lambda. Each of the groups, independently of all other groups and of the Poisson process, has a random size decribe a random variable N taking value in the positive integers. Upon arriving every individual goes order drink by herslef and spent a random amount of time T at Spoke with a distribution G(t)=P(T\le t). In preparation for a big night Spoke has infinitely many servers. After this individuals exit Spoke.

Find the the expected amount of customer Y_t at Spoke at time t.
Describe is the distribution of Y_t?
If the night is infinitely long, does the system reach an equilibrium?

*Hint: Revisit the M/G/\infty queue and the compound Poisson process.

2 Continuous time Markov chains

In this section that we build a continuous time Markov process X_t with t\ge 0. The Markov property can be expressed as P\{ X_t=j| \{X_{r}\}, 0\le r \le s \} = P\{X_t=j| X_s \} \,. for any 0 < s < t.

2.1 Exponential random variables

To construct a Markov we will need to use exponential random variables. Recall that an exponential random variable T with parameter \lambda has the pdf f_T(t)=\lambda e^{-\lambda t}, for t \ge 0, the cdf F_T(t) = 1- e^{-\lambda t} and mean E[T]=\frac{1}{\lambda}.

A simple and important fact is the memoryless property of exponential random variables.

P((T > t+s | T> s) =P(T>t)

If you think of T as a waiting time then the memoryless property tells you that if you have waited a time s then the probability that you have to wait an extra time t is exactly the same as waiting for a time t at the beginning. In that sense the process of waiting starts anew at anytime, and so you have forgotten the past. This property is the key to construct Markov process in continuous time.

For general Markov process we will need exponential random variables with various parameters and we will use the following simple fact repeatedly.

Proposition 2.1 (Properties of exponential randomm variables) Let T_1, T_2, T_3, \cdots be independent exponential random variables with parameter \lambda_1, \lambda_2, \cdots. Then

T= \min\{ T_1, \cdots, T_n\} is an exponential random variables with parameter \lambda_1+ \cdots + \lambda_n. Note that n=\infty is allowed if we assume that \sum_{n}\lambda_n is finite.
\displaystyle P\{ T_i = \min \{ T_1, \cdots, T_n\} \} = \frac{\lambda_i}{\lambda_1+ \cdots + \lambda_n}

Proof. For 1. we have, using independence, P\left\{ T > t \right\} = P\left\{ T_1 > t, \cdots, T_n > t \right\} = P\left\{ T_1 > t \right\} \cdots P\left\{ T_m > t \right\} = e^{-(\lambda_1+ \cdots \lambda_n)t} and thus T is an exponential random variable.
For 2. we have, by conditioning, P\left\{ T_1 = T \right\} = \int_0^\infty P\{ T_2>t, \cdots, T_n > t \} f_{T_1}(t) \, dt = \int_0^\infty e^{-(\lambda_2 + \cdots+ \lambda_n)t} \lambda_1 e^{-\lambda_1 t}\, dt = \frac{\lambda_1}{\lambda_1+ \cdots \lambda_n} \blacksquare

2.2 Definition of a continuous time Markov chain

As for the Poisson process we will give two equivalent definition of the process, the first one describe infinitesimal rates of change of the probability disttribution and leads to a system of ODEs describing the evolution of the pdf of X_t which are called the Kolmogorov equation. The second definition use exponential random variables and waiting times and will lead naturally to an algorithm to simulate a continuous time Markov chain, often called the stochastic simulation algorithm.
To define a Markov process on the state space S we assign a number \alpha(i,j) for any pair of states i,j with i\not=j. You should think these numbers \alpha(i,j) = \textrm{ rate at which the chain changes from state } i \textrm{ to state } j \,. We denote \alpha(i) = \sum_{j\not= i} \alpha(i,j) = \textrm{ rate at which the chain changes from state } i \,.
Formally a continuous Markov chain with rates \alpha(i,j) is a stochastic process X_t such that \begin{aligned} P\{ X_{t+\Delta t}=i| X_t=i\} &= 1 - \alpha(i)\Delta t + o(\Delta t) \\ P\{ X_{t+\Delta t}=j| X_t=i\} &= \alpha(i,j)\Delta t + o(\Delta t) \end{aligned}

Proceeding as for the Poisson process we can derive a differential equation for p_t(i)=P\{X_t =i\}. By conditioning we have P\{ X_{t+\Delta t}=i\} = (1 - \alpha(i)\Delta t )P\{ X_t=i\} + \sum_{j \not =i} \alpha(j,i)\Delta t P\{ X_t=j\} + o(\Delta t) which leads to the system of linear ODE’s \frac{d}{dt}p_t(i) = -\alpha(i) p_t(i) + \sum_{j \not= i} \alpha(j,i) p_t(j) \tag{2.1} called the Kolmogorov backward equations.
The infinitesimal generator of a continuous-time Markov chain is given by the matrix A = \begin{matrix} 1 \\ 2 \\ 3 \\ \vdots \end{matrix} \begin{pmatrix} -\alpha(1) & \alpha(1,2) & \alpha(1,3) & \cdots \\ \alpha(2,1) & -\alpha(2) & \alpha(2,3) & \cdots\\ \alpha(3,1) & \alpha(3,2) & -\alpha(3) & \cdots\\ \vdots & \vdots & \vdots & \end{pmatrix} and the entries of A satifies A(i,j)\ge 0 \textrm{ for } i \not= j \quad \textrm{ and } \sum_{i} A(i,i) = 0

If we use a row vector p_t = (p_t(1), p_t(2), \cdots) then we can rewrite Equation 2.1 as the system \frac{d}{dt}p_t = p_t A \tag{2.2}
If S is finite then one can write the solution in terms of the matrix exponential e^{tA} := \sum_{k=0}^\infty \frac{t^k A^k}{k!} p_t = p_0 e^{tA} \,. where p_0(i)=P\{X_0=i\} is the initial distribution.
We can also write equation for the transition probabilities (take X_0=i) P_t(i,j) = P \{ X_t =j| X_0=i\} and we obtain the matrix equation \frac{d}{dt}P_t = P_t A, \quad \textrm{ with } P_0 =I \quad \implies \quad P_t = e^{tA} which we can solve using linear algebra techniques.

2.3 Solving the Komogorov equation

For finite state space S finding the transition probability reduces to a linear algebra problem. For example if the matrix A is diagonalizable (e.g. if the matrix symmetric or if all the eigenvalues are distinct). let us denote the eigenvalues of A are \lambda_1, \lambda_2, \lambda_N of A with corresponding the eigenvectors f_1, \cdots, f_N. Note that 0 is always an eigenvalue with eigenvector \begin{pmatrix} 1 & 1 & \cdots &1 \end{pmatrix}^T (the sum of the rows is equal to 0).

Then we have D = Q^{-1} A Q = \begin{pmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & 0 & \cdots \\ \cdots &\cdots & \ddots & \cdots \\ \cdots & \cdots & 0 & \lambda_N \end{pmatrix} We find then that P_t = e^{At} = Q E^{Dt} Q^{-1} = Q \begin{pmatrix} e^{\lambda_1 t} & 0 & \cdots & 0 \\ 0 & e^{\lambda_2 t} & 0 & \cdots \\ \cdots &\cdots & \ddots & \cdots \\ \cdots & \cdots & 0 & e^{\lambda_N t} \end{pmatrix} Q^{-1}

For state space os smalll size N this is easy to compute using numerical or symbolic (for very small N) computation package. For example if A = \begin{matrix} 1 \\ 2 \\ 3 \\ 4 \end{matrix} \begin{pmatrix} -1 & 1 & 0 & 0 \\ 1 & -3 & 1 & 1 \\ 0 & 1 & -2 & 1 \\ 0 & 1 & 1 & -2 \end{pmatrix} we find

P_t=\begin{pmatrix}\frac{1}{4} + \frac{2 e^{- t}}{3} + \frac{e^{- 4 t}}{12} & \frac{1}{4} - \frac{e^{- 4 t}}{4} & \frac{1}{4} - \frac{e^{- t}}{3} + \frac{e^{- 4 t}}{12} & \frac{1}{4} - \frac{e^{- t}}{3} + \frac{e^{- 4 t}}{12}\\\frac{1}{4} - \frac{e^{- 4 t}}{4} & \frac{1}{4} + \frac{3 e^{- 4 t}}{4} & \frac{1}{4} - \frac{e^{- 4 t}}{4} & \frac{1}{4} - \frac{e^{- 4 t}}{4}\\\frac{1}{4} - \frac{e^{- t}}{3} + \frac{e^{- 4 t}}{12} & \frac{1}{4} - \frac{e^{- 4 t}}{4} & \frac{1}{4} + \frac{e^{- t}}{6} + \frac{e^{- 3 t}}{2} + \frac{e^{- 4 t}}{12} & \frac{1}{4} + \frac{e^{- t}}{6} - \frac{e^{- 3 t}}{2} + \frac{e^{- 4 t}}{12}\\\frac{1}{4} - \frac{e^{- t}}{3} + \frac{e^{- 4 t}}{12} & \frac{1}{4} - \frac{e^{- 4 t}}{4} & \frac{1}{4} + \frac{e^{- t}}{6} - \frac{e^{- 3 t}}{2} + \frac{e^{- 4 t}}{12} & \frac{1}{4} + \frac{e^{- t}}{6} + \frac{e^{- 3 t}}{2} + \frac{e^{- 4 t}}{12}\end{pmatrix}

Code

#| code-fold: true


import sympy as sp

# Define a 4x4 symbolic matrix
A = sp.Matrix([
    [-1, 1, 0, 0],
    [1, -3, 1, 1],
    [0, 1, -2, 1],
    [0, 1, 1, -2]
])


# Define symbolic time variable

t = sp.symbols('t')

# Compute eigenvalues and eigenvectors
eigenvalues = A.eigenvals()
eigenvectors = A.eigenvects()

# Diagonalization (if possible)
P, D = A.diagonalize()

P_inv=P.inv()

# Compute the matrix exponential e^(At)
exp_At = (A * t).exp()


# Print results
print("Matrix A:")
sp.pprint(A)


print("\nEigenvalues of A:")
sp.pprint(eigenvalues)

print("\nEigenvectors of A:")
for eig in eigenvectors:
    sp.pprint(eig)

print("\nDiagonalization:")
print("Matrix P (Eigenvectors):")
sp.pprint(P)


print("Matrix P (Eigenvectors) inverser:")
sp.pprint(P_inv)


print("\nMatrix D (Diagonal Matrix of Eigenvalues):")
sp.pprint(D)

print("\nSemigroup e^(At):")
sp.pprint(exp_At)

#sp.latex(exp_At)

Matrix A:
⎡-1  1   0   0 ⎤
⎢              ⎥
⎢1   -3  1   1 ⎥
⎢              ⎥
⎢0   1   -2  1 ⎥
⎢              ⎥
⎣0   1   1   -2⎦

Eigenvalues of A:
{-4: 1, -3: 1, -1: 1, 0: 1}

Eigenvectors of A:
⎛       ⎡⎡1 ⎤⎤⎞
⎜       ⎢⎢  ⎥⎥⎟
⎜       ⎢⎢-3⎥⎥⎟
⎜-4, 1, ⎢⎢  ⎥⎥⎟
⎜       ⎢⎢1 ⎥⎥⎟
⎜       ⎢⎢  ⎥⎥⎟
⎝       ⎣⎣1 ⎦⎦⎠
⎛       ⎡⎡0 ⎤⎤⎞
⎜       ⎢⎢  ⎥⎥⎟
⎜       ⎢⎢0 ⎥⎥⎟
⎜-3, 1, ⎢⎢  ⎥⎥⎟
⎜       ⎢⎢-1⎥⎥⎟
⎜       ⎢⎢  ⎥⎥⎟
⎝       ⎣⎣1 ⎦⎦⎠
⎛       ⎡⎡-2⎤⎤⎞
⎜       ⎢⎢  ⎥⎥⎟
⎜       ⎢⎢0 ⎥⎥⎟
⎜-1, 1, ⎢⎢  ⎥⎥⎟
⎜       ⎢⎢1 ⎥⎥⎟
⎜       ⎢⎢  ⎥⎥⎟
⎝       ⎣⎣1 ⎦⎦⎠
⎛      ⎡⎡1⎤⎤⎞
⎜      ⎢⎢ ⎥⎥⎟
⎜      ⎢⎢1⎥⎥⎟
⎜0, 1, ⎢⎢ ⎥⎥⎟
⎜      ⎢⎢1⎥⎥⎟
⎜      ⎢⎢ ⎥⎥⎟
⎝      ⎣⎣1⎦⎦⎠

Diagonalization:
Matrix P (Eigenvectors):
⎡1   0   -2  1⎤
⎢             ⎥
⎢-3  0   0   1⎥
⎢             ⎥
⎢1   -1  1   1⎥
⎢             ⎥
⎣1   1   1   1⎦
Matrix P (Eigenvectors) inverser:
⎡1/12  -1/4  1/12  1/12⎤
⎢                      ⎥
⎢ 0     0    -1/2  1/2 ⎥
⎢                      ⎥
⎢-1/3   0    1/6   1/6 ⎥
⎢                      ⎥
⎣1/4   1/4   1/4   1/4 ⎦

Matrix D (Diagonal Matrix of Eigenvalues):
⎡-4  0   0   0⎤
⎢             ⎥
⎢0   -3  0   0⎥
⎢             ⎥
⎢0   0   -1  0⎥
⎢             ⎥
⎣0   0   0   0⎦

Semigroup e^(At):
⎡       -t    -4⋅t        -4⋅t            -t    -4⋅t               -t    -4⋅t  ↪
⎢1   2⋅ℯ     ℯ       1   ℯ           1   ℯ     ℯ              1   ℯ     ℯ      ↪
⎢─ + ───── + ─────   ─ - ─────       ─ - ─── + ─────          ─ - ─── + ─────  ↪
⎢4     3      12     4     4         4    3     12            4    3     12    ↪
⎢                                                                              ↪
⎢         -4⋅t             -4⋅t              -4⋅t                     -4⋅t     ↪
⎢    1   ℯ          1   3⋅ℯ             1   ℯ                    1   ℯ         ↪
⎢    ─ - ─────      ─ + ───────         ─ - ─────                ─ - ─────     ↪
⎢    4     4        4      4            4     4                  4     4       ↪
⎢                                                                              ↪
⎢      -t    -4⋅t         -4⋅t        -t    -3⋅t    -4⋅t       -t    -3⋅t    - ↪
⎢ 1   ℯ     ℯ        1   ℯ       1   ℯ     ℯ       ℯ      1   ℯ     ℯ       ℯ  ↪
⎢ ─ - ─── + ─────    ─ - ─────   ─ + ─── + ───── + ─────  ─ + ─── - ───── + ── ↪
⎢ 4    3     12      4     4     4    6      2      12    4    6      2      1 ↪
⎢                                                                              ↪
⎢      -t    -4⋅t         -4⋅t        -t    -3⋅t    -4⋅t       -t    -3⋅t    - ↪
⎢ 1   ℯ     ℯ        1   ℯ       1   ℯ     ℯ       ℯ      1   ℯ     ℯ       ℯ  ↪
⎢ ─ - ─── + ─────    ─ - ─────   ─ + ─── - ───── + ─────  ─ + ─── + ───── + ── ↪
⎣ 4    3     12      4     4     4    6      2      12    4    6      2      1 ↪

↪    ⎤
↪    ⎥
↪    ⎥
↪    ⎥
↪    ⎥
↪    ⎥
↪    ⎥
↪    ⎥
↪    ⎥
↪    ⎥
↪ 4⋅t⎥
↪    ⎥
↪ ───⎥
↪ 2  ⎥
↪    ⎥
↪ 4⋅t⎥
↪    ⎥
↪ ───⎥
↪ 2  ⎦

2.4 Stochastic simulation algorithm

In this section we propose an alternative descritpion which is more probabilistic in nature and allows us to construct the paths of the Markov chains.
To any pair of states (i,j) we associate a “clock” T(i,j) which is an exponential random variable with paramter (rate) \alpha(i,j). All the random variables used are assumed to be independent.

If X_t=i, the Markov chain moves to another state after the first clocks T(i,j) rings, this happens at time T=\min_{k}\{ T(i,k) \} which is exponential with parameter \alpha(i)=\sum_{k\not= i} \alpha(i,k). So we have X_{t+s}= i \quad \text{ for } 0 \le s < T.
If the clocks that rings first is the clock T(i,j) that is if T(i,j) = T= \min_{k} \{ T(i,k) \} then the Markov chain moves to state j at time. That is we set T_{t+T}= j.
The probability that the Markov chain jumps to k is Q(i,j) = P\left(T(i,j) = T= \min_{k} \{ T(i,k) \}\right) = \frac{\alpha(i,j)}{\sum_k \alpha(i,k)} which defines a transition matrix.
Take a set of brand new clocks T(j,k) with rates \alpha(j,k) and repeat.

The Markov property follows form the memoryless property for an exponential distribution T: P(T\ge t +s| T \ge s) = P(T>t).
If X_t=i then by construction the position after the next jump after time t clearly depends only i and not the states that the Markov chain visited before time t. Moreover if we consider the time of the last jump before time t which occured, say at time s<t, then the memoryless property of the exponential random variable implies that the time at which the jump occur after time t does not depend on s at all. Putting these together this implies the Markov property P\{X_{t+u} =j | X_{s}, 0 \le s \le t\} = P\{X_{t+u} =j | X_t \}
To connect this to the previous description we derive an integral equation for P_t by conditioning on the first jump P_t(i,j) = P(X_t=j|X_0=i) = \underbrace{\delta(i,j) e^{-\alpha(i) t}}_{\text{ no jump in }[0,t]} + \int_0^t \underbrace{\alpha(i) e^{-\alpha(i)s}}_{\text{density of jump time }} \underbrace{\sum_{k} {Q(i,k)}}_{\text{choice of jump} } P_{t-s}(k,j) ds Iterating this equation and setting t=\Delta t we find
\begin{aligned} P(X_{\Delta t}=j|X_0=i) &= \delta(i,j) e^{-\alpha(i) \Delta t} + \int_0^{\Delta t} \alpha(i) e^{-\alpha(i)s} \sum_{k} \frac{\alpha(i,k)}{\alpha(i)} \delta(k,j) e^{-\alpha(k)(\Delta t-s)} ds + \cdots \\ &= \delta(i,j) e^{-\alpha(i) \Delta t} + \alpha(i,j) \Delta t \times \underbrace{ e^{-\alpha(j) \Delta t} \frac{1}{\Delta t} \int_0^{\Delta t} e^{-(\alpha(i)-\alpha(j)) s} ds }_{\to 1 \textrm{ as }\Delta t \to 0} + \cdots \\ &= \delta(i,j) (1 - \Delta t \alpha(i)) + \alpha(i,j) \Delta t + o(\Delta t) \end{aligned} The higher order terms are negligible since the probability to have two jumps in the time interval [0,\Delta t] is o(\Delta t).

2.5 Stochastic Simulation algorithm (or Gillepsie algorithm)

The description of Markov chain using the rates is essentially a pseudo-code to generat trajectories of X_t.

Code

#| code-fold: true
#| code-summary: "Show the code"

import numpy as np
import matplotlib.pyplot as plt

# SSA function with A matrix
def stochastic_simulation_algorithm(A, initial_state, max_time):
    num_states = A.shape[0]
    current_state = initial_state
    current_time = 0.0
    times = [current_time]
    states = [current_state]

    while current_time < max_time:
        rates = A[current_state, :].copy()
        rates[current_state] = 0
        total_rate = rates.sum()

        if total_rate <= 0:
            break

        delta_t = np.random.exponential(1 / total_rate)
        current_time += delta_t
        if current_time > max_time:
            break

        probabilities = rates / total_rate
        next_state = np.random.choice(num_states, p=probabilities)

        times.append(current_time)
        states.append(next_state)
        current_state = next_state

    return times, states

# Plot function
def plot_trajectory(times, states):
    plt.step(times, states, where='post', linewidth=2)
    plt.xlabel('Time')
    plt.ylabel('State')
    plt.title('Markov Chain Trajectory (SSA)')
    plt.grid(True)
    plt.tight_layout()
    plt.show()

# Example rate matrix A
A=np.array([
    [-3, 3, 0, 0],
    [1, -4, 3, 0],
    [0, 2, -5, 3],
    [0, 0, 2, -2]
])

initial_state = 0
max_time = 30.0

# Run simulation
times, states = stochastic_simulation_algorithm(A, initial_state, max_time)

# Plot the trajectory
plot_trajectory(times, states)

2.6 Example: Uniformizable chain

Consider a Markov chain Y_n with transition matrix Q and we assume that Q(i,i)=0. Then we pick rate \alpha(i)=1 for all states i. The times at which the Markov chain has transition is thus a sum of IID exponential, that at time t is then described by a Poisson process N_t. In other terms we have X_t = Y_{N_t} where N_t is a Poisson process with rate 1.
In this case we can compute the transition matrix quite explicitly: \begin{aligned} P\{X_t=j|X_0=i\}& = P\{ Y_{N_t}=j| X_0=i\} \\ &= \sum_{n=0}^\infty P\{ Y_{N_t}=j, N_t=n| X_0=i\} \\ &= \sum_{n=0}^\infty P\{ Y_n=j| X_0=i\} P\{N_t=n\} \\ &= \sum_{n=0}^\infty \frac{e^{- t} t^n}{n!} Q^n(i,j) = e^{t(Q-I)} \end{aligned}

and the generator is given by A = (Q-I).

2.7 Exercises

Exercise 2.1 Machine 1 is currently working and machine 2 will be put in use at a time T from now. If the lifetimes of the machines 1 and 2 are exponential random variables with parameters \lambda_1 and \lambda_2, what is the probability that machine 1 is the first machine to fail?

Exercise 2.2 Consider a two-server system in which a customer is first served by server 1, then by server 2 and then departs. The service times at server i are exponential random variables with parameter \mu_i with i = 1, 2. When you enter the system you find server 1 free and two customers at server 2, customer A in service and customer B waiting in line.

Find the probability P_A that A is still in service when you move over to server 2.
Find the probability P_B that B is still in service when you move over to server 2.
Compute E[T], where T is the total time you spend in the system. Hint: Write T = S_1 + S_2 + W_A + W_b where S_i is your service time at server i, W_A os the amount of time you wait in queue when while A is being served, and W_B the amount of time you wait in queue when while B is being served.

Exercise 2.3 (Hyperexponential and hypoexponential random variables) Suppose T_1 and T_2 are independent exponential random variables with parameters \lambda_1 and \lambda_2.

Suppose N is a RV with P(N=1)=p(1) and P(N=2)=p(2)=1-p(1). The random variable T_N is called an hyperexponential RV. It describe the service time of an agent sent to one of two service station with suitable probabilities. What is the probability distribution function of T_N.
The random variable T_1+T_2 is called an hypoexponential RV and describe the service time of an agent going through 2 successive service station. What is the probability distribution of X_1+X_2?

Exercise 2.4 (The flip-flop process) Let N_t be a poisson process and consider the process X_t = X_0 (-1)^{N_t} where X_0 is a random variable taking value in \{-1,1\} and which is independent of N_t. Note that X_t oscillates between 0 and 1. Show that P_t = \frac{1}{2} \begin{pmatrix} 1+ e^{-2 \lambda t} & 1 e^{-2 \lambda t} \\ 1- e^{-2 \lambda t} & 1+ e^{-2 \lambda t} \end{pmatrix}

3 Long time behavior of continous-time Markov chains

3.1 Stationary distributions and detailed balance

A probability vector \pi is a stationary distribution for the Markov chain with generator A if
\pi P_t = \pi \quad \textrm{ for all } t > 0 0 = \frac{d}{dt}\pi P_t = \pi A P_t \implies \pi A =0.
In terms of the rate \alpha(i,j) we see that \pi is stationary if and only if \sum_{i \not= j} \pi(i) \alpha(i,j) = \pi(j)\alpha(j) = \sum_{i \not=j} \pi(j) \alpha(j,i) which we can interpret as balance equation. The quantity \pi(i)\alpha(i,j) is the rate at which the chain in a state \pi changes from i to j and the stationarity equation means that \text{ flow of probability away from state } i = \text{ flow of probability into state } i \quad \quad \text{ holds for all states} i
As for discrete time we say that a Markov chain satisfies detailed balance if \pi(i) \alpha(i,j) = \pi(j) \alpha(j,i) \textrm{ for all } i \not= j and clearly detailed balance implies stationarity.

The Markov chain with generator A is irreducible if for any pair of states i,j we can find states i_1, \cdots, i_{N-1} such that \alpha(i,i_1) \alpha(i_2, i_3) \cdots \alpha(i_{N-1},j) >0 If there exists a stationary distribution for an irreducible chain then \pi(i)>0 for all i. Indeed if \pi(i)>0 and \alpha(i,j)>0 then \pi A(j) =0 implies that \pi(j)\alpha(j) = \sum_{k}\pi(k)\alpha(k,j) \ge \pi(i)\alpha(i,j)>0 and thus \alpha(j)>0.
The issue of periodicity cannot occur for continuous time Markov chains

Lemma 3.1 For an irreducible Markov chain with generator A, P_t(i,j)>0 for all i,j and all t>0.

Proof. Using the Markov property and a sequence of states i_0=i, i_1, \cdots, i_N=j with positive transition rates \begin{aligned} P\{X_t=j|X_0=i\} &\ge P \{ X_{t/N}=i_i, X_{2t/N}=i_2, \cdots X_{t}=j |X_0=i\} \\ &= P \{ X_{t/N}=i_1| X_0=i\} P\{ X_{2t/N}=i_2| X_{t/N}=i_i\} \cdots P\{ X_{t}=j |X_{t\frac{N-1}{N}}=i_{n-1}\} \end{aligned} and for example, using that \alpha(i,i_1)>0.
P \{ X_{t/N}=i_1| X_0=i\} \ge \int_0^{t/N} \alpha(i) e^{-\alpha(i)s} Q(i,i_1) e^{-\alpha(i_1)(t-s)}>0. \quad\blacksquare

3.2 Convergence to the stationary distribution

If the state space is finite and the chain is irreducible then we have convergence to equilibrium.

Theorem 3.1 If the state space S is finite and the Markov chain with generator A is irreducible then for any initial distribution \mu we have \lim_{t \to \infty}\mu P_t = \pi

Proof. We use a spectral argument and the result for discrete time Markov chain. Pick a number a such that a> \max{i} \alpha(i) (this is possible since A is finite). Consider the matrix R= \frac{1}{a}A + I. Then R is a stochastic matrix since 0 \le R(i,j)=\frac{\alpha(i,j)}{a}\le \frac{\alpha(i)}{a} \le 1 and R(i,i)= -\frac{\alpha(i)}{a}+ 1 is in (0,1). Clearly \sum_j R(i,j)=1 since \sum_{j} A(i,j)=0. Let us denote Y_n the Markov chain with transition matrix R, it is often called the resolvent chain for the continuous-time Markov chain X_t. The Markov chain is aperiodic since R(i,i)>0 and it is irreducible since X_t is irreducible.

Note also that \pi is a stationary distribution for Y_n if and only if it is a stationary dsitribution for X_t since \pi R = \frac{1}{a}\pi A + \pi I = \frac{1}{a}\pi A + \pi so \pi R =\pi \iff \pi A=0.

From the convergence theorem for discrete time R^n(i,j) \to \pi as n \to \infty and we have seen (see the exercises) that R has a simple eigenvalue 1 and all other eigenvalues \lambda satisfy |\lambda|<1. Now Rf = \lambda f \iff Af =a(\lambda -1) f and so 0 is a simple eigenvalue for A and the other eiegenvalue are of the form a(\textrm{Re}(\lambda) -1) + i a \textrm{ Im }(\lambda) and so the real part is strictly negative.

The vector 1=(1,1, \cdots,1 )^T is a right eigenvector for A and \pi is a left eigenvector for A. If we define the matrix \Pi as the matrix whose rows are equal to \pi, we have then (P^t - \Pi) \begin{pmatrix} 1\\ \vdots \\1 \end{pmatrix}= 0 \textrm{ and } \pi(P^t - \Pi) =0\,. Moreover if f the right eigenvector and g the left eigenvector for A for the eigenvalue \mu \not=0 then we have \pi (Af) = \mu \pi f = (\pi A) f =0 \quad \textrm{ and } (g A) 1 = g (A1) = \mu g 1 =0 and thus we must have \pi f=0 and g 1 =0. Therefore
P^t - \Pi)f = P^t f = e^{\mu t} f \textrm{ and } g(P^t - \Pi) = gP^t = e^{\mu t} g. This implies that P^t - \Pi has the simple eigenvalue 0 and the same eigenvalues e^{\mu t} as P^t and \mu has strictly negative real part. Therefore P^t - \Pi converges to 0 as t \to \infty, or \lim_{t \to \infty} P_t(i,j)= \pi(j) .

3.3 Transient behavior

To study the transient behavior in continuous time we can use similar ideas as in discrete time.

Absorption probabilities: the absorption probabilities do not depend on the time spent in every state so they can be computed using the transition matrix Q(i,j) for the embedded chain Y_n and the formula in Section [Absorption probabilities]
Expected hittiing time: For example we have the following result

Theorem 3.2 Supose X_t is an irreducible Markov chain with generator A and for j let \Sigma(j) = \inf\{t \ge 0 ; X_t=j\} be the first hitting time to state j. Let \tilde{A} the matrix obtained by deleting the j^{th} row and j^{th} column from the generator A. Then we have, for i \not= j, E[\Sigma(j)|X_0=i] = \sum_{l} B(i,l) \quad \textrm{ where } B= \tilde{A}^{-1} The matrix \tilde{A} has rowsums which are non-positive and at least one of the row must be strictly negative.

Proof. By conditioning on the first jump which happens at time T we have E[\Sigma(j)|X_0=i] = \underbrace{E[T|X_0=i]}_{\textrm{expected time until the first jump }} + \sum_{k\in S, k\not=j} P\{ X_T = k|X_0=i\} \underbrace{E[ \Sigma(j)|X_0=k]}_{\textrm{expected hitting time} \atop \textrm{ from the state after the first jump}} If we set b(i)=E[\Sigma(j)|X_0=i] (for i \not= j) we find the equation b(i)= \frac{1}{\alpha(i)} + \sum_{k \not= j} \frac{\alpha(i,k)}{\alpha(i)} b(k) \implies 1 = \alpha(i) b(i) - \sum_{k\not= j}\alpha(i,k) b(k) which reads, in matrix form as 1 = -\tilde{A} b \implies b= (-\tilde{A})^{-1} 1. To show that -\tilde{A} is invertible we consider the matrix R= \frac{1}{a}\tilde{A} + I where a is chosen larger than all the entries. Then the entries of R are non-negative and the rowsums do not exceed one with at least one strictly less than 1. By the results for discrete time Markov chains we know that I-R=(-\frac{1}{a} A) is invertible.

3.4 Explosion

For a continuous time Markov chain X_t let us consider the time of the successive jumps S_1=T_1, S_2=T_1+T_2, S_3=T_1+T_2+T_3, \cdots Here the T_i are independent exponential but, in general, no identically distributed with parameters \alpha_i which depends on the state being visited. We have then E[S_n]= \sum_{i=1}^n \frac{1}{\lambda_i}
Explosion: To see what can happen consider a Markov chain with rate \alpha(i,i+1)=(n+1)^2 and all other rates equal to 0. Then the Markov chain moves up by 1 at every jump like a Poisson process but at accelerated pace. We have then E[S_\infty]= \sum_{n=1}\frac{1}{n^2} < \infty so S_\infty < \infty with probability 1. So there are infinitely many jumps in finite time and X_t=+\infty after a finite time. This is called explosion.

This is an issue familiar in ODE: the equation \frac{d}{dt} x_t = x_t^2 has solution has solution x_t= \frac{x_0}{x_0-t} which blows up at time t=x_0.
It is not easy to determine if an explosion really occurs. Indeed for no explosion to occur we must have, with probabilty 1, \sum_{n} \frac{1}{\alpha(Y_n)} = \infty where Y_n is the embedded chain.
A sufficient condition for non-explosion is a suitable upper bound on the rates \alpha(i), say \alpha(i) \le \alpha which is true for finite state spaces but this is by no means necessary.

3.5 Transience, recurrence, and positive recurrence.

The recurrence and transience for a continuous time Markov chain can be defined as for discrete time. A an irreducbile Markov chain X_t is recurrent if the chain starting from a state i returns to i with probability 1 and transient if X_t never return to i with some non-zero probability.
Transience/recurrence has nothing to do with the actual time spent in every state and so we have immediately X_t \textrm { with rates } \alpha(i,j) \textrm{ is } \begin{array}{l} \textrm{transient} \\ \textrm{recurrent}\end{array} \iff Y_n \textrm{ with transition } Q(i,j)=\frac{\alpha(i,j)}{\alpha(i)} \textrm{ is } \begin{array}{l} \textrm{transient} \\ \textrm{recurrent}\end{array}
For positive recurrence we need to define the first return to a state i. This is most easily defined in terms of the return for the embeded Markov chain Y_n and its first return time \tau(i). The time until X_0 returns to i will the sum of the time spent in each state. If S_n denotes the time of the jumps and \tau(i)=n after visiting the state j_0, j_1, \cdots j_{n-1}, j_n=i then the return time is S_n= \sum_{k=0}^{n-1}(S_{k+1}- S_k) \quad \quad \text{ with }S_0=0 where S_{k+1}-S_k is exponential with parameter \alpha(j_k). The first return time to state i for the Markov chain X_t is then given by
\Sigma(i) = \sum_{k=0}^{\tau(i)-1}(S_{k+1}- S_k)

3.6 Stationary distribution for recurrent chains

Theorem 3.3 If the Markov chain with rates \alpha(i,j) is irreducible and recurrent, then there exists a unique solution (up to a multiplicative constant) \eta=(\eta(1), \eta(2), \cdots) to the equation \eta A=0 with 0 < \eta(i) < \infty \,. If it holds that \sum_{i}\eta(i) < \infty then \eta can be normalized to a stationary distribution and X_t is positive recurrent.

Proof. The stationarity equation \eta A can be written as \sum_{j \not= k} \eta(j) \alpha(j,k) = \alpha(k) \eta(k) \iff \sum_{j \not= k} \eta(j)\alpha(j) Q(j,k) = \eta(j)\alpha(j) That is the row vector \mu with entries \mu(k)=\alpha(k) \eta(k) must satisfy \mu Q = \mu.

If X_t is recurrent then the embedded Markov chain Y_n with transition Q is recurrent and so by the discrete time theory we know that there exists a solution to \mu Q=\mu. Therefore we have proved the existence of a solution for \eta A=0.

Moreover, we have a the representation \mu(j)=\alpha(j) \eta(j) = E\left[ \sum_{k=0}^{\tau(i)-1} {\bf 1}_{\{Y_k=j\}} |X_0=i\right] where i is some fixed but arbitrary reference state (this counts the number of visits to the state j between two consecutive visits to the reference state i)

If we denote by S_n the time of the n^{th} jump for X_t we have \begin{aligned} \eta(j) &= \sum_{k=0}^\infty E\left[ \frac{1}{\alpha(j)} {\bf 1}_{\{Y_k=j\}} {\bf 1}_{\{\tau(i) >k\}} | X_0=i \right] \\ & = \sum_{k=0}^\infty E\left[ (S_{k+1} -S_k) {\bf 1}_{\{Y_k=j\}} {\bf 1}_{\{\tau(i) >k\}} |X_0=i \right] \\ & = E\left[ \sum_{k=0}^{\tau(i)-1} (S_{k+1} -S_k) {\bf 1}_{\{Y_k=j\}} | X_0=i \right] \end{aligned} which is nothing but the time spent (by X_t) in the state j between successive visits to i.

If \sum_{j} \eta(j) <\infty then we have E[\Sigma(i)]=E\left[ \sum_{k=0}^{\tau(i)-1} (S_{k+1} -S_k)|X_0=i\right] <\infty which is the expected return time to state i. That is the chain X_t is positive recurrent.

3.7 Ergodic theorem for positive recurrent Markov chains

We have the following theorem which is the exact counterpart of the discrete time case (and is proved very similarly so we will omit the proof).

Theorem 3.4 Suppose X_t is irreducible and positive recurrent. Then X_t has a unique stationary distribution, and with probability 1, the time spent in state j, converges to \pi(j) \lim_{t\to \infty}\int_0^t \mathbf{1}_{\{X_s=j\}} \, ds = \pi(j). Moreover we have Kac’s formula: \pi(j) is also equal to the average time between consecutive visits to state j: \pi(j) = \frac{1}{E[\Sigma(j)|X_0=j]}. Conversely if X_t has a stationary distribution then X_t is positive recurrent. \quad \blacksquare

3.8 Exercises

Exercise 3.1 (Formula for the stationary distribution) Show that if A is irreducible then the stationary distribution solve the equation \pi = (1, \cdot, 1) (A + M)^{-1} where M(i,j)=1 for all i,j.

Hint: One option is to reduce it to the corresponding discrete case like in the proof of convergence in Theorem 3.1

Exercise 3.2 Consider the Markov chain X_t with generator A = \begin{matrix} 1 \\ 2 \\ 3 \\ 4 \end{matrix} \begin{pmatrix} -2 & 1 & 1 & 0 \\ 0 & -1 & 1 & 0 \\ 1 & 1 & -3 & 1 \\ 0 & 0 & 1 & -1 \end{pmatrix}

Find the stationary distribution.
If X_0=1 what is the expected time until the Markov chain visit state 4 for the first time.
If X_0=2 what is the probability that the Markov visits state 3 before state 4.
Compute (numerically) the transition probabilties P_t(i,j).
Modify the SSA simulation algorithm to extract the stationary distribution from it.

4 Birth and death process and queueing models

4.1 Birth and death process

A general birth and death process is a continuous time Markov chain with state space \{0,1,2,\cdots\} and whose only non-zero transition rates are \begin{aligned} &\lambda(n) = \alpha(n,n+1) & = \textrm{ birth rate for a population of size } n & (\textrm{ for } n\ge 0) \\ &\mu(n)=\alpha(n,n-1) & = \textrm{ death rate for a population of size } n & (\textrm{ for } n\ge 1) \end{aligned}
The Kolmogorov equations for the distribution of X_t are, for n\ge 1 \frac{d}{dt}p_t(n) = \underbrace{ \mu(n+1) p_t(n+1)}_{\textrm{ increase due do death} \atop \textrm{in a population of size } n+1} + \underbrace{\lambda_{n-1} p_t(n-1)}_{\textrm{ increase due do birth} \atop \textrm{in a population of size } n-1} - \underbrace{(\lambda(n) + \mu(n))p_t(n)}_{\textrm{ decrease due do birth/death} \atop \textrm{in a population of size} n}. For n=0, the eqation reads \displaystyle \frac{d}{dt}p_t(0)= \mu(1) p_t(1) - \lambda(0) p_t(0).
The generator has the form \begin{matrix} 0 \\ 1 \\ 2\\ \vdots \end{matrix} \begin{pmatrix} -\lambda(0) & \lambda(0) & 0 & 0 & \dots \\ \mu(1) &-\lambda(1) -\mu(1) & \lambda(1) & 0 & \dots \\ 0 & \mu(2) &-\lambda(2) -\mu(2) & \lambda(2) & \dots \\ \vdots & \ddots & \ddots & \ddots & \\ \end{pmatrix}

The transition matrix for the embedded process Y_n is \begin{matrix} 0 \\ 1 \\ 2\\ \vdots \end{matrix} \begin{pmatrix} 0 & 1 & 0 & 0 & \dots \\ \frac{\mu(1)}{\mu(1)+ \lambda(1)} & 0 & \frac{\lambda(1)}{\mu(1)+ \lambda(1)} & 0 & \dots \\ 0 & \frac{\mu(2)}{\mu(2)+ \lambda(2)} & 0 & \frac{\lambda(2)}{\mu(2)+ \lambda(2)} & \dots \\ \vdots & \ddots & \ddots & \ddots & \\ \end{pmatrix} which is the transition matrix of a general random walk on the non-negative integers.

4.2 Examples

Poisson process: There is no death and the birth rate is constant \mu(n)=0 \quad \textrm{ and } \quad \lambda(n)=\lambda
Queueing models: Imagine a Poisson process (with rate \lambda) describing the arrival of customers. In the systems there are k service station at which the service time is exponential with rate \mu. If one service station is empty, upon arrival a customer immediately enter a service station and stay in the system until being served. If all stations are currently busy serving other customers, the customer wait until one stay becomes free. These models are called M/M/k queues where M statnds for Markovian and k is the number of queues, the first M describing the arrival process and the second M the service time (which is exponential).
M/M/1 queue: If there is only one service station \lambda(n)=\lambda \quad \textrm{ and } \quad \mu(n) =\mu
M/M/k queue: If there is k service station then the rates are \lambda(n)=\lambda \quad \textrm{ and } \quad \mu(n) = \left\{ \begin{array}{cl} n \mu & n \le k \\ k \mu & n > k \end{array} \right.
M/M/\infty queue: If there is k service station then the rates are \lambda(n)=\lambda \quad \textrm{ and } \quad \mu(n) =n\mu

Population models: If X_t describes the size of a population birth rate will be naturally proportional to the size of the population if we assume that all individuals give birth or die with a certina rate.
- Pure birth model: no death occur and so \lambda(n)=n \lambda \quad \textrm{ and } \quad \mu(n) = 0
- Population model: the rates \lambda(n)=n \lambda \quad \textrm{ and } \quad \mu(n) = n \mu
- Population model with immigration: if immigrants arrive acccording to a Poisson process with rate \nu the rates are \lambda(n)=n \lambda +\nu \quad \textrm{ and } \quad \mu(n) = n \mu

4.3 Transience of birth/death chains

To study transience we use the embedded Markov chain and ?@thm-criteriontransience. Choosing the reference state 0 we look for a solution a(n) with a(0)=1 and 0 < a(n)< 1 for n\ge 1 of the system of equations \lambda(n) a(n+1) + \mu(n) a(n-1) = (\lambda(n) + \mu(n)) a(n) This leads to a(n+1) - a(n) = \frac{\mu(n)}{\lambda(n)} (a(n) - a(n-1)) = \cdots = \prod_{j=1}^n\frac{\mu(j)}{\lambda(j)} (a(1) - 1) and thus, by telescoping, a(n+1)= 1 + \sum_{k=0}^{n} (a(k+1) -a(k)) = 1 + \left[1+ \sum_{k=1}^{n} \prod_{j=1}^k\frac{\mu(j)}{\lambda(j)}\right] (a(1) - 1) Taking n \to \infty we must have a(n)\to 0 and \sum_{k=1}^\infty \prod_{j=1}^k\frac{\mu(j)}{\lambda(j)} < \infty and we find the solution a(n)=\frac{\sum_{k=n}^\infty \prod_{j=1}^k\frac{\mu(j)}{\lambda(j)}}{1+ \sum_{k=1}^\infty \prod_{j=1}^k\frac{\mu(j)}{\lambda(j)}}

4.4 Positive recurrence of birth/death chains

To study postive recurrence we simply solve for the stationary distribution \pi A=0. Since the embedded Markov chain satisfies detailed balance it is natural tro try to solve the detailed balance equations which amounts \pi(n)\lambda(n) = \pi(n+1)\mu(n+1) which is eaisly solved to find \pi(n) = \prod_{j=1}^n \frac{\lambda(j-1)}{\mu(j)} \pi(0) and thus we have a stationary distribution if and only if \sum_{k=1}^\infty \prod_{j=1}^k \frac{\lambda(j-1)}{\mu(j)} <\infty\,.

The Poisson process are bure birth models are not irreducible and converge to +\infty almost surely.
M/M/1-queue: \prod_{j=1}^n \frac{\mu(j)}{\lambda(j)}=\left(\frac{\lambda}{\mu}\right)^n
- Transient if \mu < \lambda.
- Recurrent if \mu =\lambda
- Positive recurrent if \lambda < \mu and with (geometric) stationary distribution: \pi(n)= \left(\frac{\lambda}{\mu}\right)^n\left(1 -\frac{\lambda}{\mu} \right)
M/M/k-queue: \prod_{j=1}^n \frac{\mu(j)}{\lambda(j)} = \left\{ \begin{array}{cl} n! \left( \frac{\mu}{\lambda} \right)^n & n < k \\ \frac{k!}{k^k} \left(\frac{k\mu}{\lambda}\right)^n & n \ge k \end{array} \right. .
- Transient if k \mu < \lambda.
- Recurrent if k \mu =\lambda
- Positive recurrent if \lambda < k \mu, the stationary \pi is a bit messy to write.
M/M/\infty-queue: \prod_{j=1}^n \frac{\mu(j)}{\lambda(j)}= n! \left( \frac{\mu}{\lambda} \right)^n.

Always positive recurrent with Poisson stationary distribution \pi(n) = e^{- \frac{\lambda}{\mu}} \frac{ \left(\frac{\lambda}{\mu}\right)^n}{n!}

4.5 Speed of convergence via coupling M/M/\infty queue

We can analyze the M/M/\infty queue in more details by the following argument. Suppose X_0=j is the size of the queue at time o. Then at time t the people in queue are of two different types: either they were in queue at time 0 and are still in queue at time t or they arrive after time between times 0 and time t and are still in queue.

The probability that a person in queue at time 0 is still in queue at time t is e^{-\mu t}.

Since the arrival of a Poisson process condiioned on N_t are uniformly distributed on [0,t], the probability that someone not present at time 0 is still present at time t is q(t)=\frac{1}{t}\int_0^t (1- e^{-\mu(t-s)})= \frac{1- e^{-\mu t}}{\mu t} ,. By combining this we see that X_t = N_t + Y_t where N_t \text{ has a binomial distribution with parameters } j \text{ and } e^{-\mu t} Y_t \text{ has a Poisson distribution with parameters } \lambda q(t)t = \frac{\lambda}{\mu}(1 - e^{-\mu t})

Clearly we see, again, from this computation that the distribution of X_t converges to a Poisson distribution with parameter \frac{\lambda}{\mu}.

We can also use it to control the speed of convergence since the previous calculation suggest a coupling. Indeed let k> j and set B_t \text{ has a binomial distribution with parameters } k-j \text{ and } e^{-\mu t} Then with N_t and Y_t as above we set X_t = Y_t + N_t \quad {\tilde X}_t = Y_t + N_t + M_t The X_t and {\tilde X}_t are M/M/\infty queues starting at j and j respectively and they form a coupling.

We have then \|P_t(j, \cdot) - P_t(k,\cdot)\|_{TV} \le P( X_t \not= Y_t) =P(M_t \ge 1) \le E[|M_t|]=(k-j)e^{-\mu t}

If we start with two arbitrary initial distribution \nu and {\tilde \nu} we find then the bound \|\nu P_t - {\tilde \nu} P_t\|_{TV} \le \sum_{i} \nu(i) {\tilde \nu}(j) \|\mu P_t(i,\cdot) - \nu P_t(j, \cdot)\|_{TV} \le \sum_{i} \nu(i) {\tilde \nu}(j) |j-i| e^{-\mu t} = E[|X_0 - {\tilde X}_0|] e^{-\mu t} where X_0 has distribution \nu and {\tilde X}_0 has distribution {\tilde \nu}

4.6 Queueing networks

A queueing network is an interconnected netwrok of service facilities called nodes. Each node has its own queueing rules and there are probabilistic rules to move between nodes and exit the system. Queueing netwroks can be either closed or open.
A closed network has a fixed number K of nodes and a fixed number M of customers. No agents enters or exit the systems. Imagine for example a company with M trucks which can be either in a repair shop or assigned to a variety of tasks.
A open network has a fixed number K of nodes but a variables number of customers. Agents can enters or exit the system at some of the nodes in the systems. Imagine for example data packets being routed in a computer network.
To describe the process in a network with M nodes we consider the queue length process {\bf Q_t} = \left(Q_{1,t}, \cdots, Q_{K,t}\right) \quad \text{ where } Q_{i,t} = \text{ number of customers at node } i \text{ at time} t.
The state space of the process for the queue length is process is given by \begin{aligned} S_M & = \left\{ {\bf n} = \left(n_1, \cdots, n_K\right) \,:\, n_i \ge 0\,,\, \sum_{i=1}^K n_i=M \right\} \quad \quad \text{ closed networks} \\ S & = \left\{ {\bf n} = \left(n_1, \cdots, n_K\right) \,:\, n_i \ge 0\,\right\}\quad \quad \quad \quad \quad \quad\quad\quad\quad \text{ open networks} \\ \end{aligned}

:::

4.7 Closed network of queues

To describe a closed queuing network we need to describing how the M agents move along the K nodes of the networks. This is described in terms of routing matrix P with entries P_{il} = \text{ Probability that an agent exiting node $i$ moves to node $l$ } It is possible that P_{ii}>0 describing the case where an agent may need two services. The matrix P is a stochastic matrix with P_{il}\ge 0 \quad \text{ and } \sum_{l=1}^K P_{il} =1 We assume that P is irreducible and we will denote by \pi the corresponding stationary distribution, i.e. \pi P=\pi.

We assume that at node i there is a single service station and that when at node i an agent is served with a rate b_i so that service times at nodes i are IID exponentials with parameters b_i.

Traffic equation: If the system is in equilbrium then the rate r_j at which customers leave a node j should be the to the rate at which customer leaves a node j. This leads to the equation \underbrace{r_j}_{\text{exit rate}} = \underbrace{\sum_{i} r_i P_{ij}}_{\text{entrance rate}} and therefore the stationary distribution should depend on the stationary distribution \pi for P.

Waiting time: suppose that the system is in the state {\bf n}= (n_1,\cdots, n_K) then the rate \alpha({\bf n}) is determined by the minimum service time at all the nodes (which are not empty of agents) and thus we have \alpha({\bf n}) = \sum_{i: n_i>0} b_i = \sum_{i: n_i>0} b_i (1 -\delta_{n_i,0})

Transition rates At the time a service is completed the agent will move from one node (say i) to another node (say j), it will be useful to introduce the notation T_{ij}{\bf n}=(n_1,\cdots, n_i-1, \cdots, n_j+1, \cdots, n_K) \quad \text{ provided } n_i>0 which describes the possible transition transition. Note that we have T_{ji} T_{ij}{\bf n} \quad \text{ if } n_i >0 and {\bf m} = T_{ij} {\bf n} \iff {\bf n} = T_{ji} {\bf m} \quad \text{ provided } n_i>0 \text{ and } m_j >0 The non-zero transition rates (when one agent moves from node i to node j) are given by \alpha({\bf n}, T_{ij} {\bf n}) = b_i P_{ij} \quad \text{ if } n_i>0

We have the following

Theorem 4.1 (stationary distribution for closed queueing networks) Consider a closed queuing networks with M agents and K nodes with a single service station at each node. The waiting time at node i is exponential with rate b_i>0 and the motion between nodes is given by an irreducible stochastic matrix P_{i,j} and stationary distribution \pi_i. The stationary distribution is given by \eta({\bf n}) = C \prod_{l=1}^K \left( \frac{\pi_l}{b_l}\right)^{n_l} \quad \text{ with } C = \sum_{{\bf n} \in S_M}\prod_{l=1}^K \left( \frac{\pi_l}{b_l}\right)^{n_l} is a stationary distribution for the process Q_t with rates \alpha({\bf n}, T_{ij} {\bf n}) = b_iP_{ij} (for n_i>0)

The form of the stationary distribution is suggested by the form of the stationary distribution for the M/M/1 queue. But note that here the total number of customer is fixed.

The constant C is in general difficult to compute.

Proof. We need to show that \sum_{{\bf n}\not= {\bf m}} \eta({\bf n}) \alpha({\bf n}, {\bf m}) = \alpha({\bf m}) \eta({\bf m}).

We will make use of the traffic equation \pi P =\pi as well as from the fact that for m_j>0 we have \eta( T_{ji}{\bf m} ) = \left(\frac{\pi_1}{b_1} \right)^{n_1} \cdots \left( \frac{\pi_j}{b_j}\right)^{n_{j}-1} \cdots \left(\frac{\pi_i}{b_i}\right)^{n_{i}+ 1} \cdots \left(\frac{\pi_K}{b_K}\right)^{n_K} = \eta({\bf m}) \frac{\pi_i/b_i}{\pi_j/b_j}

Note that we have \begin{aligned} \sum_{{\bf n}} \eta({\bf n}) \alpha({\bf n}, {\bf m}) & = \sum_{{\bf n}: {\bf m}= T_{ij}{\bf n} \text{ for some } i,j } \eta({\bf n}) \alpha({\bf n}, {\bf m}) =\sum_{{\bf n}: {\bf n}= T_{ji}{\bf m} \text{ for some } i,j } \eta({\bf n}) \alpha({\bf n}, {\bf m}) \\ &= \sum_{i,j: m_j> 0} \eta(T_{ji}{\bf m})\alpha(T_{ji}{\bf m}, {\bf m}) \\ &= \sum_{i,j: m_j> 0} \eta({\bf m}) \frac{\pi_i/b_i}{\pi_j/b_j} b_i P_{ij} \\ &= \sum_{j: m_j> 0} \eta({\bf m}) \frac{ \sum_{i} \pi_i P_{ij}}{\pi_j/b_j} \\ &= \eta({\bf m}) \sum_{j: m_j> 0} b_j \quad \quad \text{ since } \pi_j= \sum_{i} \pi_i P_{ij} \\ &= \eta({\bf m}) \alpha({\bf m}) \end{aligned} Since the state space if finite we can normalize \eta({\bf n}) to make it a statonary distribution.

4.8 Open queueing (Jackson) networks

In an open queuing network the number of customers in the system can be arbitrary and customers can both enter and leave the systems (both must occur to obtain a stable system.) We need the following ingredients

Arrival rates a_i \ge 0 at each node which describing customers which enter the system at node i according to a Poisson process with rate a_i.
Waiting times at each node with rate b_i>0 which describe the service time at node i.
Routing matrix Q which describe the transition between nodes and the exits from the system. We assume Q is sub-stochastic, i.e. Q_{ij} \ge 0 and \sum_{j} Q_{i,j}\le 1 with at least one row have a sum \sum_{j} Q_{i,j} <1. We interpet Q_{i,j} = P( \text{ go to node } j \text{ after service at node } i ) q_i = 1-\sum_{j}Q(i,j) = P( \text{ exit the system } \text{ after service at node } i ) We can think of Q as describing the transition probablities of a set of transient states in a Markov chain with an absorbing state which corresponds to being out of the system. In particular we know that I-Q is invertible.

Traffic equations If we are in a stationary state the rate r_j at which customers leave the node j should balance the rate at which customer enter node j. This leadf to the equation \underbrace{r_j}_{\text{rate of leaving} j} = \underbrace{a_{j}}_{\text{rate of entering } j \text{ from outisde}} + \sum_{i} \underbrace{ r_{i} Q_{ij}}_{\text{rate of entering } j \text{ from node }i} If we use a row vector r=(r_1, \cdots, r_j) we obtain r = a + rQ \implies r = a(I-Q)^{-1}
Transition rates As for closed networks we will use the function T_{ij} which removes an agent at noe i and adds it at node j. But we will also need the transition maps S_i^+ (n_1, \cdots, n_K) = (n_1, \cdots, n_i+1, \cdots, n_K )\,, S_i^- (n_1, \cdots, n_K) = (n_1, \cdots, n_i-1, \cdots, n_K ) \quad (\text{if } n_i>0) which add/remove agents to/from the system at node i. The transition rates and jump rates are then given by \begin{aligned} &\alpha({\bf n}, S^+_j {\bf n}) = a_j \\ &\alpha({\bf n}, S^{-}_j {\bf n}) = b_j q_j \quad \text{ if } n_j > 0 \\ &\alpha({\bf n}, T_{ij} {\bf n}) = b_i Q_{ij} \quad \text{ if } n_i > 0 \end{aligned} \alpha({\bf n}) = \sum_{j} a_j + \sum_{j: m_j>0} b_j

Theorem 4.2 (stationary distribution for open queueing networks) Consider an open queueing network with arrival rate a_i, waiting times rate b_i and routing matrix Q_{ik} (irreducible and substochastic). Let r denote the solution of the traffic equation r = a + rQ. Then stationary distribution exists if and only r_i< b_i for all nodes and is then given by a product of geometric distribution \eta({\bf n}) = C \prod_{l=1}^K \left( \frac{r_l}{b_l} \right)^{n_l}

We prove \sum_{{\bf n}\not= {\bf m}} \eta({\bf n}) \alpha({\bf n}, {\bf m}) = \alpha({\bf m}) \eta({\bf m}). We decompose that sum over {\bf n} into three different pieces corresponding to the different transition (arrivals, exit, swap of nodes). We have

\begin{aligned} \sum_{j: m_j>0} \eta( S_j^{-} {\bf m }) \alpha( S_j^{-} {\bf m}, {\bf m}) = \sum_{j: m_j>0} \eta({\bf m}) \frac{b_j}{r_j} a_j = \eta({\bf m}) \sum_{j: m_j>0} \frac{b_j}{r_j} a_j \end{aligned}

\begin{aligned} \sum_{j} \eta( S_j^{+} {\bf m }) \alpha( S_j^{+} {\bf m}, {\bf m}) = \sum_{j} \eta({\bf m}) \frac{r_j}{b_j} b_j q_j= \eta({\bf m}) \sum_{j} r_j q_j \end{aligned}

Finally using the traffic equations we have \begin{aligned} &\sum_{i,j: m_j> 0} \eta(T_{ji}{\bf m})\alpha(T_{ji}{\bf m}, {\bf m}) = \sum_{i,j: m_j> 0} \eta({\bf m}) \frac{r_i/b_i}{r_j/b_j} b_i Q_{ij} \\ &= \eta({\bf m}) \sum_{j: m_j> 0} \frac{b_j}{r_j} \sum_{i} r_i Q_{ij} = \eta({\bf m}) \sum_{j: m_j> 0} \frac{b_j}{r_j} (r_j -a_j)=\eta({\bf m}) \sum_{j: m_j> 0} b_j \left(1 -\frac{a_j}{r_j}\right) \\ \end{aligned}

From the traffic equation and multipling by the row vector of 1’s’ and using that Q1=1-q we find r 1 = a 1 + rQ1 = a 1 + r (1-q) \implies a 1 = r q \quad \text{ or } \quad \sum_{j}a_j = \sum_{j} r_j q_j

Summing up the three terms we find that \sum_{{\bf n}\not= {\bf m}} \eta({\bf n}) \alpha({\bf n}, {\bf m}) = \eta({\bf m}) \left(\sum_{j} a_j + \sum_{j: m_j>0} b_j \right) = \alpha({\bf m}) \eta({\bf m}) as desired. \quad\blacksquare

4.9 Exercises

Exercise 4.1 (Yule process) The Yule process is is a pure birth process describing the the growth of a population: if there are n individual in the population then each individual will give birth with a rate \lambda and so the birth rate is \lambda(n)=n \lambda. The goal here is to compute explicitly the transition probbaility P(X_t=n|X_0=k).

Assume first X(0)=1 and let T_i be the time it takes for the population to go from size i to size i+1, that is T_i is exponential with rate \lambda i.

Show that we have the relation T_1 + T_2 + \cdots T_n = \max( S_1, S_2, \cdots S_n) where the S_i are exponential with rate 1.
Using 1. compute P(T_1 + T_2 + \cdots T_n < t) and show that P(X_t=n|X_0=1) = e^{-\lambda t}(1- e^{-\lambda t})^{n-1} which is a geometric distribution with success probability p=e^{-\lambda t}.
Using 2. show that P(X_t=n|X_0=k) = {n-1 \choose k-1} e^{-\lambda t k}(1- e^{-\lambda t})^{n-k} which is negative binomial distribution.

Exercise 4.2 (Non-Explosion for birth/death models) Show that a birth and death model with birth and death rates which satisfiy \lambda(n) + \mu(n) \le a n +b does not undergo explosion.

Exercise 4.3 (Population model with immigartion) Consider a birth death model with birth rate \lambda(n)= n \lambda + \nu and death rate \mu(n)=n\mu.

For which value of \lambda \mu and \nu is the process positive recurrent, null recurrent, transient.
Suppose X_0=i. Show that the mean m(t)=E[X_t] and the variance v(t)= E[X_t^2] - m(t)^2 satisfy differential equations. Solve them.

Exercise 4.4 (Geometric sum of exponential) For use in Exercise 4.3 prove the following fact. If Q is geometric that P(Q=k)=(1-q)^{k-1}q for k=1,2,3,\cdots and T_i are IID exponential with parameter \lambda then T_1 + T_2 + \cdots T_Q is exponential with parameter \lambda q.

Hint: You can use either the MGF (like in the compound Poisson process or like for branching processes) or the CDF and the PDF..

Exercise 4.5 (More on the M/M/1 queue) Consider an M/M/1 queue with arrival rate \lambda and service rate \mu. Recall that when \lambda < \mu its stationary distribution is geometric with parameter \lambda/\mu.

Show that if the queue is stationary then rate at which customer leaves must be equal to \lambda (and is independent of \mu!)
Suppose again that the queue is stationary, compute the distribution and the expectation of the time W a customer spends in the queue until they reach service station (this is often an important quantity when designing a queueing model!).
Hint: Note the distribution of W as a continuous and discrete part. Use the result in Exercise 4.4.
Suppose that upon entering the system the customers look at the length of the queue and may decide to leave the system depending on the length of the queue. For example assume that if there are n customers in the system, upon entering customers will stay with probability p(n)=\frac{1}{n+1}. Find the stationary distribution in this case.
Suppose that agents are actually difficult customers: upon exiting the service station with probability q they exit the system for good but with probaility 1-q they re-enter the sytem and go back in line Show that this process is equivalent to another M/M/1 queue with new rates.
Hint: Use Exercise 4.4 again.

Exercise 4.6 (M/M/1 queues in series) Consider the following queueing system. Agents arrive into the system according to a Poisson process with rate \lambda and then pass successviely through a sequence of K service stations where the service time in station i is exponential with parameters \mu_i.

We describe the number of customers by the vector {\bf X_t} =(X_{1t}, \cdots X_{Kt}) where X_{it} is the number of customers at the i^{th} station and which which takes values in the state space S = \left\{ {\bf n} = \left(n_1, \cdots, n_K\right) \,:\, n_i \ge 0 \text{ integer} \right\}\,.

Make a list of all possible transitions and compute the corresponding rate.
From Exercise 4.3 (part 1.) we learned that for a single M/M/1 queue in equilibrium the rate at which customers enter and leave the system is equal to \lambda. Since the customers are then fed into the next queue this suggests that for the queue in series the rate of customer entering and exiting all the queues will all be equal to \lambda and that can be achieved by the product of geometric distributions \pi({\bf n})= \pi\left(n_1, \cdots, n_K\right) = \prod_{i=1}^K \left( 1- \frac{\lambda}{\mu_i} \right) \left( \frac{\lambda}{\mu_i} \right)^{n_i} Prove that \pi is indeed stationary.
Hint: it is easiest to prove this using detailed balance.

Exercise 4.7 (Jackson networks) Consider an open Jackson network such that at node i the service rate is n_ib_i if the Q_t={\bf n}. This correspond to the situation with infnityl many servers at each node. Show that such network has a stationary distribution given by \eta({\bf n}) = \prod_{l=1}^K \frac{\left(\frac{r_k}{b_k}\right)^{n_l}e^{-\left(\frac{r_k}{b_k}}}{n_l!} where r_k satisfy the traffic equation r = a + rQ.