Stochastic Processes, III, Continuous-time Markov chains

Math 606, Spring 2023

Luc Rey-Bellet

University of Massachusetts Amherst

2023-06-25

1 Poisson processes

Definition of the Poisson process

  • Counting processes N_t are continuous time stochastic process which counts the number of events occuring up to time t: N_t = \textrm{total number of events occuring up to time } t e.g. the number of accidents reported to an insurance company, the number of earthquakes in California, and so on.

  • The Poisson process is the simplest counting process, intuitvely characterized by the following properties:

    1. The number of events occuring in disjoint time intervals are independent.
    2. The rate at which event occur is constant over time.
    3. The events occur one at a time.
  • More formally a Poisson process N_t with rate paratmeter \lambda is a continuous time stochastic process such that

    • Independent increments: Given times s_1\le t_1 \le s_2 \le t_2 \cdots \le s_n \le t_n the random variables X_{t_i}-X_{s_i} (that is the number of events occurring in the disjoint time intervals [s_i,t_i]) are independent.

    • The rate at which events occur is denoted by \lambda>0 and for any (small) time interval [t, t + \Delta t] we have
      \begin{aligned} P \{ N_{t + \Delta t } = N_t \} &= 1- \lambda \Delta t +o(\Delta t) & \\ P\{N_{t + \Delta t}= N_t +1\} &= \lambda \Delta t +o(\Delta t) & \textrm{ with } \lim_{\Delta t \to 0}\frac{o(\Delta t)}{\Delta t}=0 \\ P\{N_{t + \Delta t}\ge N_t +2\} & = o(\Delta t) & \end{aligned} \tag{1.1}

Distribution of the Poisson process

Theorem 1.1 (Distribution of the Poisson process) The Poisson process N_t with N_0=0 has the distribution P\{N_t=k\} = e^{-\lambda t}\frac{(\lambda t)^k}{k!} i.e. N_t has Poisson distribution with paramter \lambda t. Morever for s, M_t=N_{t+s}-N_{s} is a Poisson process.

Proof (version 1 using Poisson limit). Pick a large number n and divide the interval [0,t] into n intervals of size \frac{t}{n}. Write
N_t = \sum_{j=1}^n\left( N_{j \frac{t}{n}}- N_{(j-1) \frac{t}{n}} \right) as a sum of n independent random variables. If n is large the probability that any of these random variables is at least 2 is small. Indeed, by a union bound, we have \begin{aligned} P\left\{ N_{j \frac{st}{n}}- N_{(j-1) \frac{t}{n}} \ge 2 \textrm { for some } j \right\} &\le \sum_{j=1}^n P\left\{ N_{j \frac{t}{n}}- N_{(j-1) \frac{t}{n}} \ge 2 \right\} \le n P \left\{ N_{\frac{t}{n}} \ge 2\right\} = t \frac{o( \frac{t}{n})}{\frac{t}{n}} \end{aligned} which goes to 0 as n \to \infty.

Therefore N_t is, approximately, a binomial random variables with success probability \lambda \frac{t}{n}: P(N_t=k) \approx {n \choose k} \left(\frac{\lambda t}{n} \right)^k \left(1-\frac{\lambda t}{n} \right)^{n-k} and as n \to \infty this converges to a Poisson distribution with parameter \lambda t. \quad \blacksquare.

Proof (version 2 using ODEs). Let us derive a system of ODEs for the family P_t(k)=P\left\{ N_t=k \right\}. We have \frac{d}{dt}P_t(k) = \lim_{\Delta t \to 0} \frac{P\left\{ N_{t+\Delta t}=k \right\} - P\left\{ N_t=k \right\} }{\Delta t} Conditioning we find \begin{aligned} P\left\{ N_{t+\Delta t}=k \right\}=& P\left\{ N_{t+\Delta t}=k | N_t=k \right\} P\left\{ X_{t}=k \right\} \\ & + P\left\{ N_{t+\Delta t}=k | N_t=k-1 \right\}P\left\{ X_{t}={k-1} \right\} \\ & + P\left\{ N_{t+\Delta t}=k | N_t\le k-2\right\}P\left\{X_t \le k-2\right\} \\ =& P_t(k) (1 - \lambda \Delta t) + P_t(k-1) \Delta t + o(\Delta t) \end{aligned} and this gives the system of equations \begin{aligned} \frac{d}{dt}P_t(0)&= - \lambda P_t(0) & \quad P_0(0)=1 \\ \frac{d}{dt}P_t(k)&= \lambda P_t(k-1) - \lambda P_t(k) &\quad P_0(k)=0 \end{aligned}

We find P_0(t)= e^{-\lambda t} for k=0. We use an integrating factor and set f_t(k)=e^{\lambda t} P_t(k). We have then f_t(0)=1 and for k >0 \frac{d}{dt}f_t(k)=\lambda f_t(k-1), \quad f_0(k)=0
which we can solve iteratively to find f_t(k) = \frac{(\lambda t)^k}{k!} and thus N_t has distribution P_t(k)= e^{-\lambda t}\frac{(\lambda t)^k}{k!}\,. \blacksquare.

Poisson process and exponential random variables

  • Poisson processes and exponential (and gamma) random variables are intimately related. Given a Poisson process we consider the interarrival times T_1, T_2, \cdots: T_1 is time of the occurence of the first event, T_2 is the time elapsed between the first and second event, and so on.

Theorem 1.2 If N_t is a Poisson process with parameter \lambda the interarrival times are independent exponential random variables with paramter \lambda.

Proof. If T_1 >t it means no event has occured up time t and so N_t=0. Therefore P\left\{ T_1 >t \right\} = P\left\{ N_t =0\right\} = e^{-\lambda t} and thus T_1 has an exponential distribution. For T_2 we condition on T_1 P\left\{ T_2 >t \right\} = \int P(T_2>t |T_1 =s) f_{T_1}(s) ds and, using the independence of the increments, P\left\{ T_2>t |T_1 =s\right\} = P\left\{ 0 \textrm{ events } in (s, s+t] | T_1=s \right\} = P\left\{ 0 \textrm{ events } in (s, s+t] \right\} = e^{-\lambda t} from which we conclude that T_2 has exponential distribution and is independent of T_1. This argument can be repeated for T_3 by conditioning on the time of the second event, T_1+T_2, and so on. \quad \blacksquare

Another set of closely related quantities are the arrival times of the n^{th} event S_1, S_2, \cdots which are related to the interarrival times by S_1=T_1, \quad S_2=T_1+T_2, \quad S_3=T_1+T_2+T_3, \cdots By Theorem 1.2 S_n is the sum of n independent exponential RVs and thus S_n a Gamma RV with parameter (n,\lambda) with density f_{S_n}(t) = \lambda e^{-\lambda t}\frac{(\lambda t)^{n-1}}{(n-1)!}\, \quad \textrm{ for }t \ge 0\,. We can actually prove this fact using the Poisson process by noting that, by definition,
N_t \ge n \iff S_n \le t \,, that is if n or more events have occured by time t if and only the n^{th} event has occured prior to or at time t.

So the CDF of S_n is F_{S_n}(t)=P\{ S_n \le t\} = P \{ N_t \ge n\} = \sum_{j=n}^\infty e^{-\lambda t} \frac{(\lambda t)^{j}}{j!} and upon differentiating we find f_{S_n}(t)= - \lambda e^{-\lambda t} \sum_{j=n}^\infty \frac{(\lambda t)^{j}}{j!} + \lambda e^{-\lambda t}\sum_{j=n}^\infty \frac{(\lambda t)^{j-1}}{(j-1)!} = \lambda e^{-\lambda t} \frac{(\lambda t)^{n-1}}{(n-1)!}. \quad \blacksquare

Poisson process and uniform distribution

Let us start with a special case and assume assume that N_t=1, that is exactly one event has occured in [0,t]. Since the Poisson process has independent increments it seems reasonable the event may have occur with equal probability at any time on [0,t]. Indeed for s\le t we have, using the independence of increments. \begin{aligned} P\{T_1 \le s | N_t=1\} & = \frac{P\{T_1 \le s , N_t=1\}}{P\{N_t=1\}} = \frac{P\{\textrm{1 event in }(0,s] \textrm{ and no event in (s,t]} \}}{P\{N_t=1\}} \\ &= \frac{ \lambda s e^{-\lambda s} e^{-\lambda(t-s)} }{\lambda t e^{-\lambda t}} = \frac{s}{t} \end{aligned} and thus the density of the arrival time T_1 conditioned on N_t=1 is uniform on [0,t]

We study further the properties of the arrival times S_1< S_2 <\cdots of a Poisson process. The following result tells us that they follow a uniform distribution on [0,t].

Theorem 1.3 Given the event \{N_t=n\}, the n arrival times S_1, S_2, \cdots have the same distribution as the order statistics for n independent random variables uniformly distributed on [0,t].

Proof. The conditional density of (S_1, \cdots, S_n) given that N_t=n can be obtained as follows. If S_1=s_1, S_2=s_2, \cdots, S_n=s_n and N_t=n then the intearrival times must satisfy T_1=s_1, T_2=s_2-s_1, \cdots, T_n= s_n-s_{n-1}, T_{n+1}> t-s_n.

By the independence of the interarrival times proved in Theorem 1.2 the conditional density is given by \begin{aligned} f(s_1, s_2, \cdots, s_n|n) &= \frac{f(s_1, s_2, \cdots, s_n, n)}{P\{N_t=nxw\}} \\ &= \frac{\lambda e^{-\lambda s_1} \lambda e^{-\lambda(s_2-s_2)} \cdots \lambda e^{-\lambda(s_n-s_{n-1})} e^{-\lambda(t-s_n)} }{e^{-\lambda t} \frac{(\lambda t)^n}{n!} } = \frac{n!}{t^n} \end{aligned} which is the joint density of the order statistic of n uniform. \blacksquare

  • Recall if X_1, \cdots, X_n are IID random variable with joint density f(x_1)\cdots f(x_n) and X^{(1)} \le X^{(2)} \le \cdots X^{(n)} the order statistics, then the joint pdf of X^{(1)}, \cdots ,X^{(n)} is given by g(x_1,\cdots, x_n)) = \left\{ \begin{array}{cl} n! f(x_1) \cdots f(x_n) &\textrm{ if } x_1 \le x_2 \le \cdots x_n \\ 0 & \textrm{ else } \end{array} \right.

Simulation of Poisson process

The characterization of the Poisson process in terms of exponential random variables suggest immediately a very simple algorithm to simulate N_t.

Simulate independent exponential RVs T_1, T_2, \cdots with parameter \lambda and set N_t=0 for 0\le t <T_1, N_t=1 for T_1\le t <T_1+T_2, and so on.

Figure 1.1: A sample of the Poisson process

Simulation of a Poisson random variable

  • It is not easy to simulate directly a Poisson random variable X from its pdf/cdf but we can do it elegantly using its relation with exponential random variable. To do this generate independent exponential random variable until they sum up to 1 (so as to generate X=N_1) and use the relation between exponential and uniform.
    \begin{aligned} X =n & \iff \sum_{k=1}^n T_k \le 1 < \sum_{k=1}^{n+1} T_k \iff \sum_{k=1}^n - \frac{1}{\lambda} \ln(U_k) \le 1 < \sum_{k=1}^{n+1} - \frac{1}{\lambda} \ln(U_k) \\ & \iff \ln( \prod_{k=1}^n U_k) \ge - \lambda > \ln( \prod_{k=1}^{n+1} U_k) \iff \prod_{k=1}^n U_k \ge e^{-\lambda} > \prod_{k=1}^{n+1} U_k \end{aligned}

  • Algorithm to simulate a Poisson random variable with parameter \lambda: Generate random numbers until their product is smaller than e^{-\lambda}.

    • Generate random number U_1, U_2, \cdots.
    • Set X=n if n+1 = \inf \left\{ j \,:\, \prod_{k=1}^{j} U_j < e^{-\lambda} \right\}

Long-time behavior of the Poisson process

  • We investigate the behavior of N_t for large time. We prove a CLT type result, namely that
    \lim_{t \to \infty} \frac{N_t-\lambda t}{\sqrt{\lambda t}} = Z \textrm{ in distribution} where Z is a standard normal RV.

  • Recall that the characteristic function of a Poisson RV Y with parameter \mu is E[e^{i\alpha Y}]= e^ {\mu (e^{i\alpha} -1)}. Therefore E\left[ e^{i \alpha \frac{N_t - \lambda t}{\sqrt{\lambda t}}} \right] = e^{\lambda t \left( e^{i \frac{\alpha}{\sqrt{\lambda t}}} -i \frac{\alpha}{\sqrt{\lambda t}} -1\right)} Expanding the exponential we have \lambda t \left(e^{i \frac{\alpha}{\sqrt{\lambda t}}} -i \frac{\alpha}{\sqrt{\lambda t}} -1\right) = - \frac{\alpha^2}{2} + O( \frac{1}{\sqrt{\lambda t}}) and thus \lim_{t \to \infty} E\left[ e^{i \alpha \frac{N_t - \lambda t}{\sqrt{\lambda t}}} \right]=e^{-\frac{\alpha^2}{2}}.

  • The same computation shows also that, for any fixed t, \lim_{\lambda \to \infty} \frac{N_t-\lambda t}{\sqrt{\lambda t}} = Z \textrm{ in distribution} since rescaling the parameter is equivalent to rescaling time.

Sampling a Poisson process

  • We can sample or split a Poisson process. Suppose that every event of a Poisson process (independently of the other events) comes into two different types, say type 1 with probability p and type 2 with probability q=1-p.

Theorem 1.4 Suppose N_t is a Poisson process with paramter \lambda and that every event (independently) is either of type 1 with probability 1 or type 2 with probability q=1-p. Then N^{(1)}_t, the number of events of type 1 up to time t, and N^{(2)}_t, the number of events of type 2 up to time t, are independent Poisson process with rate \lambda p and \lambda(1-p).

Proof. We check that N^{(1)}_t satisfy the definition of a Poisson process and the use Theorem 1.1

N^{(1)}_0=0 and N^{(1)}_t has independent increments since N_t has independent increments and events are classified of type 1 and 2 independently of each other.

We have
\begin{aligned} P \left\{ N^{(1)}_{t + \Delta t}= N^{(1)}_t +1 \right\} & = P\{ N_{t + \Delta t}= N_t +1 \textrm{ and the event is of type } 1 \} \\ & + P\{ N_{t + \Delta t} \ge N_t +2 \textrm{ and exactly one event is of type } 1 \} \\ &= \lambda \Delta t \times p + o(\Delta t) \end{aligned}

\begin{aligned} P \left\{ N^{(1)}_{t + \Delta t}= N^{(1)}_t \right\} & = P\{ N_{t + \Delta t}= N_t \} + P\{ N_{t + \Delta t} = N_t + 1 \textrm{ and the event is of type 2} \} \\ & + P\{ N_{t + \Delta t} \ge N_t + 2 \textrm{ and no event of type $1$} \} \\ &= (1- \lambda \Delta t) + \lambda \Delta t (1-p) + o(\Delta t) =1-\lambda \Delta t p + o(\Delta t) \end{aligned} P \left\{ N^{(1)}_{t + \Delta t}\ge N^{(1)}_t +2 \right\} =o(\Delta t)

Finally to show that N^{(1)}_t and N^{(2)}_t are independent we compute their joint PDF by conditioning on the value of N_t and find \begin{aligned} P\left\{N^{(1)}_t =n , N^{(2)}_t =m \right\} &= P\left\{N^{(1)}_t =n , N^{(2)}_t =m | N_t={n+m}\right\} P\left\{N_t={n+m}\right\} \\ &= {n+m \choose n} p^n (1-p)^m e^{-\lambda t} \frac{(\lambda t)^{n+m}}{(n+m)!} \\ &= e^{-\lambda p t} \frac{ (\lambda p t )^n}{n!} e^{-\lambda (1-p) t} \frac{(\lambda(1-p) t)^{m}}{m!} \end{aligned} \blacksquare

The coupon collecting problem

  • We revisit the couplon collector but we relax the assumption that all the toys are equally probable. We assume that any box contains toy i with probability p_i. How do we compute now the expected number of boxes needed to collect all the M toys? The argument used earlier does not generalize easily.

  • We use the following trick or radomizing the time between boxes. Instead of collecting boxes at fixed time interval, we collect them at times which are exponentially distributed with paramter 1. Then the number of boxes collected up to time t a Poisson process N_t with rate \lambda=1 (on average it takes the same time to get a new box). We have now M types of events (getting a box with toy i) and we split the poisson process accordingly. Then by Theorem 1.4 the number of toys of type i collected up to time t, N^{(i)}_t is a Poisson process with rate \lambda p_i=p_i and the Poisson processes N^{(i)}_t are independent.

  • We now consider the times T^{(i)} = \textrm{ time of the first event for the process } N^{(i)}_t that is the time where the first toy of type i is collected. The times T^{(i)} are independent since the underlying Poisson processes are independent, and are exponential with paramter p_i. Furthermore S = \max_{i} T^{(i)} = \textrm{ time until one toy of each type has been collected}. By independence we have P\left\{ S \le t \right\} = P\left\{ \max_{i} T^{(i)} \le t\right\} = \prod_{i=1}^M P\left\{ T^{(i)} \le t \right\} = \prod_{i=1}^M (1- e^{-p_it})

Thus E[S]= \int_0^\infty P\{S \ge t\} dt = \int_0^\infty ( 1- \prod_{i=1}^M (1- e^{-p_it}) )\, dt

  • Finally we relate S to the original question. If X is the number of box needed to collect all the toys then we have S = \sum_{k=1}^X S_k where S_k aree IID exponential with parameter 1. But conditioning it is easy to see that E[S]= E[N]E[S_1] = E[N] and we are done.

Poisson process with variable rate

  • We can generalize the Poisson process by making the rate \lambda(t) at which event occur depend on time: a nonhomogeneous Poisson process N_t with rate paratmeter \lambda(t) is a continuous time stochastic process such that

  • Independent increments: Given times s_1\le t_1 \le s_2 \le t_2 \cdots \le s_n \le t_n the random variables N_{t_i}-N_{s_i} (that is the number of events occurring in the disjoint time intervals [s_i,t_i]) are independent.

  • We have \begin{aligned} P \{ N_{t + \Delta t } = N_t \} &= 1- \lambda(t) \Delta t +o(\Delta t) & \\ P\{N_{t + \Delta t}= N_t +1\} &= \lambda(t) \Delta t +o(\Delta t) & \textrm{ with } \lim_{\Delta t \to 0}\frac{o(\Delta t)}{\Delta t}=0 \\ P\{N_{t + \Delta t}\ge N_t +2\} & = o(\Delta t) & \end{aligned} \tag{1.2}

  • One way to construct a nonhomgeneous Poisson process is by sampling it in a time-dependent manner. Suppose \lambda(t) is bounded (locally in t), then we pick \lambda > \lambda(t). We consider a Poisson process M_t with constant rate \lambda, and if an event occurs at time t then we decide to keep this event with probability p(t)=\frac{\lambda(t)}{\lambda} and we discard the event with probability 1-p(t). By the same argument we used in the section Sampling a Poisson process we see the number of kept events satisfies the definition of a non-homogeneous Poisson process in Equation 1.2

  • Let us consider an event for the process M_t which occured in the interval [0,t]. By our analysis of arrival time we know that this event occured a time which is uniformly distributed on the interval [0,t]. Therefore the probability that this event was accepted and contribute to N_t is therefore p_t = \frac{1}{t}\int_0^t \frac{\lambda(s)}{\lambda} \, ds

By repeating then the second part of the arguement in the section Sampling a Poisson process we see that M_t has a Poisson distribution with parameter \lambda t p_t = \int_0^t \lambda(s) \, ds and in particular E[N_t] = \int_0^t \lambda(s) \, ds

Queueing model with infinitely many servers

  • Assume that the the flow of customers entering an onine store follows a Poisson process N_t with rate \lambda. The time S spent in the store for a single customer (browsing around, checking out, etc..) is given by tis CDF G(t)=P\{ S \le t\} and we assume that the customers are independent of each other.

  • To figure out how to allocate ressources one wants to figure out what is number of customers, M_t, which are still in the sytem at time t.

  • To find the distribution of M_t let us consider one of the customer by time t. If he arrived at time s\le t then he will have left the system at time t with probability G(t-s) and will still be in the system by time t with probability 1-G(t-s). Since the arrival time of that customer is uniform on [0,t] the distribution of M_t is Poisson with mean E[M_t] = \int_0^t (1- G(t-s))ds = \lambda \int_0^t (1 - G(s))ds,.
    For large t, we see that E[M_t] \approx \lambda E[S].

Compound Poisson process

  • Suppose that the number of claims receieved by an insurance follows a Poisson process. The size of each claim will be different and it natural to assume that claims are independent from each other. If we look at the total claims incurred by the insurance company this leads to a stochastic process called a compound Poisson process.

  • A stochastic process X_t is called a compound Poisson process if it has the form X_t = \sum_{k=1}^{N_t} Y_k where N_t is a Poisson process and Y_1, Y_2,\cdots are IID random variables which are also independent of N_t.

  • The process X_t has stationary independent increments. Using that N_t-N_s is a Poisson process X_t-X_s = \sum_{k=N_s}^{N_t} Y_k \textrm{ has the same distribution as } X_{t-s}= \sum_{0}^{N_{t-s}} Y_k

  • We can compute the MGF of X_t (or its charactersitic function) by conditining on N_t. Suppose m_Y(\alpha)=E[e^{\alpha Y}] is the moment generating function of Y and using the MGF for the Poisson RV we find \begin{aligned} m_{X_t}(\alpha)& =E\left[ e^{\alpha X_t}\right] = E\left[ e^{\alpha \sum_{k=1}^{N_t} Y_k} \right] = \sum_{n=0}^\infty E\left[ e^{\alpha \sum_{k=1}^{N_t} Y_k} | N_t =n \right] P\{n_t=n\} \\ & = \sum_{n=0}^\infty m(\alpha)^n P\{n_t=n\} = e^{\lambda t ( m(\alpha) -1 )} \end{aligned}

We can compute then the mean and variance \begin{aligned} m_{X_t}'(\alpha)&= e^{\lambda t ( m(\alpha) -1 )} \lambda t m'(\alpha) \\ m_{X_t}''(\alpha)&= e^{\lambda t ( m(\alpha) -1 )} (\lambda t m''(\alpha) + \lambda t)^2() m'(\alpha)^2 ) \end{aligned} and thus E[X_t]= \lambda t E[Y] \quad \textrm{ and } Var[X_t]= \lambda t ({\rm Var}(Y) + E[Y]^2]) With a bit more work we could prove a central limit theorem.

2 Continuous time Markov chains

In this section that we build a continuous time Markov process X_t with t\ge 0. The Markov property can be expressed as P\{ X_t=j| \{X_{r}\}, 0\le r \le s \} = P\{X_t=j| X_s \} \,. for any 0 < s < t.

Exponential random variables

To construct a Markov we will need lots of exponential random variables. Recall that an exponential random variable T with paramter \lambda has the pdf f_T(t)=\lambda e^{-\lambda t}, for t \ge 0 the cdf F_T(t) = 1- e^{-\lambda t} and mean E[T]=\frac{1}{\lambda}.

Proposition 2.1 (Properties of exponential randomm variables) Let T_1, T_2, T_3, \cdots be independent exponential random variables with parameter \lambda_1, \lambda_2, \cdots. Then

  1. T= \min\{ T_1, \cdots, T_n\} is an exponential random variables with paramter \lambda_1+ \cdots \lambda_n. Note that n=\infty is allowed if we assume that \sum_{n}\lambda_n is finite.
  2. \displaystyle P\{ T_i = \min \{ T_1, \cdots, T_n\} \} = \frac{\lambda_i}{\lambda_1+ \cdots \lambda_n}

Proof. For 1. we have, using independence, P\left\{ T > t \right\} = P\left\{ T_1 > t, \cdots, T_n > t \right\} = P\left\{ T_1 > t \right\} \cdots P\left\{ T_m > t \right\} = e^{-(\lambda_1+ \cdots \lambda_n)t} and thus T is an exponential random variable.
For 2. we have, by conditioning, P\left\{ T_1 = T \right\} = \int_0^\infty P\{ T_2>t, \cdots, T_n > t \} f_{T_1}(t) \, dt = \int_0^\infty e^{-(\lambda_2 + \cdots+ \lambda_n)t} \lambda_1 e^{-\lambda_1 t}\, dt = \frac{\lambda_1}{\lambda_1+ \cdots \lambda_n} \blacksquare

Definition of a continuous time Markov chain

  • As for the Poisson process we will give two equivalent definition of the process, the first one describe infinitesinmal rate of change and leads to a system of ODEs describing the evolution of the pdf of X_t. The second definition use exponential random variables and waiting times and leads to an algorithm to simulate a continuous time Markov chain, often called the stochastic simulation algorithm.
  • To define a Markov process on the state space S we assign a number \alpha(i,j) for any pair of state i\not=j. You should think these numbers \alpha(i,j) = \textrm{ rate at which the chain changes from state } i \textrm{ to state } j \,. We denote \alpha(i) = \sum_{j\not= i} \alpha(i,j) = \textrm{ rate at which the chain changes from state } i \,.
  • Formally a continuous Markov chain with rates \alpha(i,j) is a stochastic process such that \begin{aligned} P\{ X_{t+\Delta t}=i| X_t=i\} &= 1 - \alpha(i)\Delta t + o(\Delta t) \\ P\{ X_{t+\Delta t}=j| X_t=i\} &= \alpha(i,j)\Delta t + o(\Delta t) \end{aligned}

  • Proceeding as for the Poisson process we can derive a differential equation for p_t(i)=P\{X_t =i\}. By conditioning we have P\{ X_{t+\Delta t}=i\} = (1 - \alpha(i)\Delta t )P\{ X_t=i\} + \sum_{j \not =i} \alpha(j,i)\Delta t P\{ X_t=j\} + o(\Delta t) which leads to the system of linear ODE’s \frac{d}{dt}p_t(i) = -\alpha(i) p_t(i) + \sum_{j \not= i} \alpha(j,i) p_t(j) \tag{2.1} called the Kolmogorov backward equations.
  • The infinitesimal generator of a continuous-time Markov chain is given by the matrix A = \begin{matrix} 1 \\ 2 \\ 3 \\ \vdots \end{matrix} \begin{pmatrix} -\alpha(1) & \alpha(1,2) & \alpha(1,3) & \cdots \\ \alpha(2,1) & -\alpha(2) & \alpha(2,3) & \cdots\\ \alpha(3,1) & \alpha(3,2) & -\alpha(3) & \cdots\\ \vdots & \vdots & \vdots & \end{pmatrix} and the entries of A satifies A(i,j)\ge 0 \textrm{ for } i \not= j \quad \textrm{ and } \sum_{i} A(i,i) = 0

  • If we use a row vector p_t = (p_t(1), p_t(2), \cdots) then we can rewrite Equation 2.1 as the system \frac{d}{dt}p_t = p_t A \tag{2.2}

  • If S is finite then one can write the solution in terms of the matrix exponential e^{tA}= \sum_{k=0}^\infty \frac{t^k A^k}{k!} p_t = p_0 e^{tA} \,. where p_o(i)=P\{X_0=i\} is the initial distribution.

  • We can also write equation for the transition probabilities (take X_0=i) P_t(i,j) = P \{ X_t =j| X_0=i\} and we obtain the matrix equation \frac{d}{dt}P_t = P_t A, \quad \textrm{ with } P_0 =I \quad \implies \quad P_t = e^{tA} which we can solve using linear algebra techniques

Example

A = \begin{matrix} 1 \\ 2 \\ 3 \\ 4 \end{matrix} \begin{pmatrix} -1 & 1 & 0 & 0 \\ 1 & -3 & 1 & 1 \\ 0 & 1 & -2 & 1 \\ 0 & 1 & 1 & -2 \end{pmatrix} The (right) eigenvalues of A are 0, -1, -3, -4 and we can diagonalize A and find D=\begin{pmatrix} 0 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & -3 & 0 \\ 0 & 0 & 0 & -4 \\ \end{pmatrix} = Q^{-1} A Q where the rows of Q are the corresponding eiegenvectors: Q=\begin{pmatrix} 1& 1 & 0 & -\frac13 \\ 1& 0 & 0 & 1 \\ 1& -\frac12 & -\frac12 & -\frac13 \\ 1& -\frac12 & \frac12 & \frac13 \\ \end{pmatrix} \quad \quad Q^{-1}=\begin{pmatrix} \frac14 & \frac14 & \frac14 & \frac14 \\ \frac23& 0 & -\frac13 & -\frac13 \\ 0& 0 & -1 & -1 \\ -\frac14& \frac34 & -\frac14 & -\frac14 \\ \end{pmatrix} Then we find P_t = e^{tA} = Q e^{tD} Q^{-1} =

Alternative description

We can also construct the Markov chain X_t using exponential random variable.

  • If X_t=i consider (independent) random variable T(i,j) with parameter \alpha(i,j) for j \not =i. Think of those as exponential clocks and when the first of the clocks rings, say clock j, then the Markov chain moves to state j.

  • Using Proposition 2.1 we see that the chain moves from the state i to j if the clocks associated to the pair (i,j) rings first and this occur with probability Q(i,j) = \frac{\alpha(i,j)}{\sum_{j \not= i}\alpha(i,j)} = \frac{\alpha(i,j)}{\alpha(i)}

  • The matrix Q(i,j) is the transition matrix for a discrete time Markov chain Y_n which has the same junmps probabilities as X_t but the transition occur at each unit time.

  • Then a new set of clocks with rates \alpha(j,k) is produced and the process starts anew until the Markov chain moves to a new state.

  • By Proposition 2.1 we can express this slightly differently. If in state i we use a single clock with rate \alpha(i) and when this clock rings then we move to state j according to the discrete time Markov chain Q(i,j).

  • The Markov property follows form the memoryless property for an exponential distribution T: P(T\ge t +s| T \ge s) = P(T>t).

  • If X_t=i then by construction the position after the next jump after time t clearly depends only i and not the states that the Markov chain visited before time t. Moreover if we consider the time of the last jump before time t which occured, say at time s<t, then the memoryless property of the exponential random variable implies that the time at which the jump occur after time t does not depend on s at all. Putting these together this implies the Markov property P\{X_{t+u} =j | X_{s}, 0 \le s \le t\} = P\{X_{t+u} =j | X_t \}

  • To connect this to the previous description we note that by conditioning on the first jump P_t(i,j) = P(X_t=j|X_0=i) = \delta(i,j) e^{-\alpha(i) t} + \int_0^t \alpha(i) e^{-\alpha(i)s} \sum_{k} {Q(i,k)} P_{t-s}(k,j) ds Iterating this equation and setting t=\Delta t we find
    \begin{aligned} P(X_{\Delta t}=j|X_0=i) &= \delta(i,j) e^{-\alpha(i) \Delta t} + \int_0^t \alpha(i) e^{-\alpha(i)s} \sum_{k} \frac{\alpha(i,k)}{\alpha(i)} \delta(k,j) e^{-\alpha(k)(t-s)} ds + \cdots \\ &= \delta(i,j) e^{-\alpha(i) \Delta t} + \alpha(i,j) \Delta t \times \underbrace{ e^{-\alpha(j) \Delta t} \frac{1}{\Delta t} \int_0^{\Delta t} e^{-(\alpha(i)-\alpha(j)) s} ds }_{\to 1 \textrm{ as }\Delta t \to 0} + \cdots \\ &= \delta(i,j) (1 - \Delta t \alpha(i)) + \alpha(i,j) \Delta t + o(\Delta t) \end{aligned} The higher order terms are negligible since the probability to have two jumps in the time interval [0,\Delta t] is o(\Delta t).

Uniformizable chain

  • Consider a Markov chain Y_n with transition matrix Q and we assume that Q(i,i)=0. Then we pick rate \lambda(i)=1 for all states i. The times at which the Markov chain has transition is thus a sum of IID exponential, that at time descriebed by a Poisson process. In other terms we have X_t = Y_{N_t} where N_t is a Poisson process with rate 1.
  • In this case we can compute the transition matrix quite explicitly: \begin{aligned} P\{X_t=j|X_0=i\}& = P\{ Y_{N_t}=j| X_0=i\} \\ &= \sum_{n=0}^\infty P\{ Y_{N_t}=j, N_t=n| X_0=i\} \\ &= \sum_{n=0}^\infty P\{ Y_n=j| X_0=i\} P\{N_t=n\} \\ &= \sum_{n=0}^\infty \frac{e^{-\lambda t} (\lambda t)^n}{n!} Q^n(i,j) \end{aligned}

and the generator is given by A = (Q-I).

Stationary distributions and detailed balance

  • A probability vector \pi is a stationary distribution for the Markov chain with generator A if
    \pi P_t = \pi \quad \textrm{ for all } t =0 which, by Komlogorov equation, is equivalent to \pi A = 0

  • In terms of the rate \alpha(i,j) we see that \pi is stationary if ansd only if \sum_{i \not= j} \pi(i) \alpha(i,j) = \pi(j)\alpha(j) = \sum_{i \not=j} \pi(j) \alpha(j,i) which we can interpret as a balance equation: \pi(i)\alpha(i,j) is the rate at which the chain in a state \pi changes from i to j.

  • As for discrete time we say that a Markov chain satisfies detailed balance if \pi(i) \alpha(i,j) = \pi(j) \alpha(j,i) \textrm{ for all } i \not= j and detailed balance obviously implies stationarity.

  • The Markov chain with generator A is irreducible if for any pair of states i,j we can find states i_1, \cdots, i_{N-1} such that \alpha(i,i_1) \alpha(i_2, i_3) \cdots \alpha(i_{N-1},j) >0 If there exists a stationary distribution for an irreducible chain then \pi(i)>0 for all i. Indeed if \pi(i)>0 and \alpha(i,j)>0 then \pi A(j) =0 implies that \pi(j)\alpha(j) = \sum_{k}\pi(k)\alpha(k,j) \ge \pi(i)\alpha(i,j)>0 and thus \alpha(j)>0.

  • The issue of periodicity cannot occur for continuous time Markov chains

Lemma 2.1 For an irreducible Markov chain with generator A, P_t(i,j)>0 for all i,j and all t>0.

Proof. Using the Markov property \begin{aligned} P\{X_t=j|X_0=i\} &\ge P \{ X_{t/N}=i_i, X_{2t/N}=i_2, \cdots X_{t}=j |X_0=i\} \\ &= P \{ X_{t/N}=i_1| X_0=i\} P\{ X_{2t/N}=i_2| X_{t/N}=i_i\} \cdots P\{ X_{t}=j |X_{t\frac{N-1}{N}}=i_{n-1}\} \end{aligned} and for example
P \{ X_{t/N}=i_1| X_0=i\} \ge \int_0^{t/N} \alpha(i) e^{-\alpha(i)s} Q(i,i_1) e^{-\alpha(i_1)(t-s)}>0 \blacksquare

Convergence to the stationary distribution

If the state space is finite and the chain is irreducible then we have convergence to equilibrium.

Theorem 2.1 If the state space S is finite and the Markov chain with generator A is irreducible then for any initial distribution \mu we have \lim_{t \to \infty}\mu P_t = \pi

Proof. We use a spectral argument and the result for discrete time Markov chain. Pick a number a such that a is strictly greater than all entries of A (this is possible since A is finite). Consider the matrix R= \frac{1}{a}A + I. Then R is a stochastic matrix and the Markov chain Y_n with transition matrix R is irreducible and aperiodic (because Q(i,i)>0). Then from ?@thm-convMC we know (details in the exercises) that R has a simple eigenvalue 1 and all other eigenvalues \lambda satisfy |\lambda|<1. Now Qf = \lambda f \iff Af =a(\lambda -1) f and so 0 is a simple eigenvalue for A and the other eiegenvalue are of the form a(\textrm{Re}(\lambda) -1) + i a \textrm{ Im }(\lambda) and so the real part is strictly negative.

The vector 1=(1,1, \cdots,1 )^T is a right eigenvector for A and \pi is a left eigenvector for A. If we define the matrix \Pi as the matrix whose rows are equal to \pi, we have then (P^t - \Pi) \begin{pmatrix} 1\\ \vdots \\1 \end{pmatrix}= 0 \textrm{ and } \pi(P^t - \Pi) =0\,. Moreover if f the right eigenvector and g the left eigenvector for A for the eigenvalue \mu \not=0 then we have \pi (Af) = \mu \pi f = (\pi A) f =0 \quad \textrm{ and } (g A) 1 = g (A1) = \mu g 1 =0 and thus we must have \pi f=0 and g 1 =0. Therefore
P^t - \Pi)f = P^t f = e^{\mu t} f \textrm{ and } g(P^t - \Pi) = gP^t = e^{\mu t} g. This implies that P^t - \Pi has the simple eigenvalue 0 and the same eigenvalues e^{\mu t} as P^t and \mu has strictly negative real part. Therefore P^t - \Pi converges to 0 as t \to \infty, or \lim_{t \to \infty} P_t(i,j)= \pi(j) .

Transient behavior

To study the transient behavior in continuous time we can use similar ideas as in discrete time.

  • Absorption probabilities: the absorption probabilities do not depend on the time spent in every state so they can be computed using the transition matrix Q(i,j) for the embedded chain Y_n and the formula in Section [Absorption probabilities]

  • Expected hittiing time: For example we have the following result

Theorem 2.2 Supose X_t is a Markov chain with generator A and for i\not j let \Sigma(j) = \inf\{t \ge 0 ; X_t=i\} be the first hitting time to state j. Let \tilde{A} the matrix obtained by deleting the j^{th} and j^{th} column from the generator A. Then we have E[\Sigma(j)|X_0=i] = \sum_{l \not= j} B(i,l) \quad \textrm{ where } B= \tilde{A}^{-1} The matrix \tilde{A} has rowsums which are non-positive and at least one of the row must be strictly negative.

Proof. By conditioning on the first jump which happens at time T we have E[\Sigma(j)|X_0=i] = \underbrace{E[T|X_0=i]}_{\textrm{expected time until the first jump }} + \sum_{k\in S, k\not=j} P\{ X_T = k|X_0=i\} \underbrace{E[ \Sigma(j)|X_0=k]}_{\textrm{expected hitting time} \atop \textrm{ from the state after the first jump}} If we set b(i)=E[\Sigma(j)|X_0=i] (for i \not= j) we find the equation b(i)= \frac{1}{\alpha(i)} + \sum_{k \not= j} \frac{\alpha(i,k)}{\alpha(i)} b(k) \implies 1 = \alpha(i) b(i) - \sum_{k\not= j}\alpha(i,k) b(k) which reads, in matrix form as 1 = -\tilde{A} b \implies b= (-\tilde{A})^{-1} 1. To show that -\tilde{A} is invertible we consider the matrix R= \frac{1}{a}\tilde{A} + I where a is chosen larger than all the entries. Then the entries of R are non-negative and the rowsums do not exceed one with at least one strictly less than 1. By the results for discrete time Markov chains we know that I-R=(-\frac{1}{a} A) is invertible.

Explosion

  • For a continuous time Markov chain X_t let us consider the time of the successive jumps S_1=T_1, S_2=T_1+T_2, S_3=T_1+T_2+T_3, \cdots Here the T_i are independent exponential but, in general, no identically distributed with paramters \alpha_i which depends on the state being visited. We have then E[S_n]= \sum_{i=1}^n \frac{1}{\lambda_i}

  • Explosion: To see what can happen consider a Markov chain with rate \alpha(i,i+1)=(n+1)^2 and all other rates equal to 0. Then the Markov chain moves up by 1 at every jump like a Poisson process but at accelerated pace. We have then E[S_\infty]= \sum_{n=1}\frac{1}{n^2} < \infty so S_\infty < \infty with probability 1. So there are infinitely many jumps in finite time and X_t=+\infty after a finite time. This is called explosion.

  • This is an issue familiar in ODE: the equation \frac{d}{dt} x_t = x_t^2 has solution has solution x_t= \frac{x_0}{x_0-t} which blows up at time t=x_0.

  • It is not easy to determine if an explosion really occurs. Indeed for no explosion to occur we must have, with probabilty 1, \sum_{n} \frac{1}{\alpha(Y_n)} = \infty where Y_n is the embedded chain.

  • A sufficient condition for non-explosion is a suitable upper bound on the rates \alpha(i), say \alpha(i) \le \alpha which is true for finite state spaces but this is by no means necessary.

Transience, recurrence, and positive recurrence.

  • The recurrence and transience for a continuous time Markov chain can be defined as for discrete time. A an irreducbile Markov chain X_t is recurrent if the chain starting from a state i returns to i with probability 1 and transient if X_t never return to i with some non-zero probability.

  • Transience/recurrence has nothing to do with the actual time spent in every state and so we have immediately X_t \textrm { with rates } \alpha(i,j) \textrm{ is } \begin{array}{l} \textrm{transient} \\ \textrm{recurrent}\end{array} \iff Y_n \textrm{ with transition } Q(i,j)=\frac{\alpha(i,j)}{\alpha(i)} \textrm{ is } \begin{array}{l} \textrm{transient} \\ \textrm{recurrent}\end{array}

  • For positive recurrence we need to define the first return to a state i. This is most easily defined in terms of the return for the embeded Markov chain Y_n and its first return time \tau(i). The time until X_0 returns to i will the sum of the time spent in each state. If S_n denotes the time of the jumps and \tau(i)=n after visiting the state j_0, j_1, \cdots j_{n-1}, j_n=i then the return time is \sum_{k=0}^{n-1}(S_{k+1}- S_k) where S_{k+1}-S_k is exponential with paramter \alpha(j_k). The first return time to state i for the Markov chain X_t is given \Sigma(i) = \sum_{k=0}^{\tau(i)-1}(S_{k+1}- S_k)

Stationary distribution for recurrent chains

Theorem 2.3 If the Markov chain with rates \alpha(i,j) is irreducible and recurrent, then there exists a unique solution (up to a multiplicative constant) \eta=(\eta(1), \eta(2), \cdots) to the equation \eta A=0 with 0 < \eta(i) < \infty \,. If it holds that \sum_{i}\eta(i) < \infty then \eta can be normalized to a stationary distribution and X_t is positive recurrent.

Proof. The equation \eta A(k)=0 can be written as \sum_{j \not= k} \eta(j) \alpha(j,k) = \alpha(k) \eta(k) \iff \sum_{j \not= k} \eta(j)\alpha(j) Q(j,k) = \eta(j)\alpha(j) That is the row vector \mu with entries \mu(k)=\alpha(k) \eta(k) must satisfy \mu Q = \mu.

If X_t is recurrent then the embedded Markov chain Y_n with transition Q is recurrent and so by ?@thm-stationarydistribution there exists a solution to \mu Q=\mu and therefore we found a solution for \eta A=0.

Moreover, from ?@thm-stationarydistribution we have a the representation \mu(j)=\alpha(j) \eta(j) = E\left[ \sum_{k=0}^{\tau(i)-1} {\bf 1}_{\{Y_k=j\}}\right] where i is some fixed but arbitrary reference state (this counts the number of visits to the state j between two consecutive visits to the reference state i)

If we denote by S_n the time of the n^{th} jump for X_t we have \begin{aligned} \eta(j) &= \sum_{k=0}^\infty E\left[ \frac{1}{\alpha(j)} {\bf 1}_{\{Y_k=j\}} {\bf 1}_{\{\tau(i) >k\}} \right] = \sum_{k=0}^\infty E\left[ (S_{k+1} -S_k) {\bf 1}_{\{Y_k=j\}} {\bf 1}_{\{\tau(i) >k\}} \right] = E\left[ \sum_{k=0}^{\tau(i)-1} (S_{k+1} -S_k) {\bf 1}_{\{Y_k=j\}} \right] \end{aligned} which is nothing but the time spent (by X_t) in the state j between successive visits to i.

If \sum_{j} \eta(j) <\infty then we have E[\Sigma(j)]=E\left[ \sum_{k=0}^{\tau(i)-1} (S_{k+1} -S_k)\right] <\infty which is the expected return time to state i. That is the chain X_t is positive recurrent.

Ergodic theorem for positive recurrent Markov chains

We have the following theorem which is the exact counterpart of the discrete time case (and is proved very similarly so we will omit the proof).

Theorem 2.4 Suppose X_t is irreducible and positive recurrent. Then X_t has a unique stationary distribution, and with probability 1, the time spent in state j, converges to \pi(j) \lim_{t\to \infty}\int_0^t \mathbf{1}_{\{X_s=j\}} \, ds = \pi(j). Moreover we have Kac’s formula: \pi(j) is also equal to the average time between consecutive visits to state j: \pi(j) = \frac{1}{E[\Sigma(j)|X_0=j]}. Conversely if X_t has a stationary distribution then X_t is positive recurrent.

Birth and death process

  • A general birth and death process is a continuous time Markov chain with state space \{0,1,2,\cdots\} and whose only non-zero transition rates are \begin{aligned} &\lambda(n) = \alpha(n,n+1) & = \textrm{ birth rate for a population of size } n & (\textrm{ for } n\ge 0) \\ &\mu(n)=\alpha(n,n-1) & = \textrm{ death rate for a population of size } n & (\textrm{ for } n\ge 1) \end{aligned}

  • The Kolmogorov equations for the distribution of X_t are, for n\ge 1 \frac{d}{dt}p_t(n) = \underbrace{ \mu(n+1) p_t(n+1)}_{\textrm{ increase due do death} \atop \textrm{in a population of size } n+1} + \underbrace{\lambda_{n-1} p_t(n-1)}_{\textrm{ increase due do birth} \atop \textrm{in a population of size } n-1} - \underbrace{(\lambda(n) + \mu(n))p_t(n)}_{\textrm{ decrease due do birth/death} \atop \textrm{in a population of size} n}. For n=0, the eqation reads \displaystyle \frac{d}{dt}p_t(0)= \mu(1) p_t(1) - \lambda(0) p_t(0).

  • The generator has the form \begin{matrix} 0 \\ 1 \\ 2\\ \vdots \end{matrix} \begin{pmatrix} -\lambda(0) & \lambda(0) & 0 & 0 & \dots \\ \mu(1) &-\lambda(1) -\mu(1) & \lambda(1) & 0 & \dots \\ 0 & \mu(2) &-\lambda(2) -\mu(2) & \lambda(2) & \dots \\ \vdots & \ddots & \ddots & \ddots & \\ \end{pmatrix}

  • The transition matrix for the embedded process Y_n is

\begin{matrix} 0 \\ 1 \\ 2\\ \vdots \end{matrix} \begin{pmatrix} 0 & 1 & 0 & 0 & \dots \\ \frac{\mu(1)}{\mu(1)+ \lambda(1)} & 0 & \frac{\lambda(1)}{\mu(1)+ \lambda(1)} & 0 & \dots \\ 0 & \frac{\mu(2)}{\mu(2)+ \lambda(2)} & 0 & \frac{\lambda(2)}{\mu(2)+ \lambda(2)} & \dots \\ \vdots & \ddots & \ddots & \ddots & \\ \end{pmatrix} which is the transition matrix of a general random walk on the non-negative integers.

Examples

  • Poisson process: There is no death and the birth rate is constant \mu(n)=0 \quad \textrm{ and } \quad \lambda(n)=\lambda

  • Queueing models: Imagine a Poisson process (with rate \lambda) describing the arrival of customers. In the systems there are k service station at which the service time is exponential with rate \mu. If one service station is empty, upon arrival a customer immediately enter a service station and stay in the system until being served. If all stations are currently busy serving other customers, the customer wait until one stay becomes free. These models are called M/M/k queues where M statnds for Markovian and k is the number of queues, the first M describing the arrival process and the second M the service time (which is exponential).

  • M/M/1 queue: If there is only one service station \lambda(n)=\lambda \quad \textrm{ and } \quad \mu(n) =\mu

  • M/M/k queue: If there is k service station then the rates are \lambda(n)=\lambda \quad \textrm{ and } \quad \mu(n) = \left\{ \begin{array}{cl} n \mu & n \le k \\ k \mu & n > k \end{array} \right.

  • M/M/\infty queue: If there is k service station then the rates are \lambda(n)=\lambda \quad \textrm{ and } \quad \mu(n) =n\mu

  • Population models: If X_t describes the size of a population birth rate will be naturally proportional to the size of the population if we assume that all individuals give birth or die with a certina rate.

    • Pure birth model: no death occur and so \lambda(n)=n \lambda \quad \textrm{ and } \quad \mu(n) = 0

    • Population model: the rates \lambda(n)=n \lambda \quad \textrm{ and } \quad \mu(n) = n \mu

    • Population model with immigration: if immigrants arrive acccording to a Poisson process with rate \nu the rates are \lambda(n)=n \lambda +\nu \quad \textrm{ and } \quad \mu(n) = n \mu

Transience

  • To study transience we use the embedded Markov chain and ?@thm-criteriontransience. Choosing the reference state 0 we look for a solution a(n) with a(0)=1 and 0 < a(n)< 1 for n\ge 1 of the system of equations \lambda(n) a(n+1) + \nu(n) a(n-1) = (\lambda(n) + \mu(n)) a(n) This leads to a(n+1) - a(n) = \frac{\mu(n)}{\lambda(n)} (a(n+1) - a(n)) = \cdots = \prod_{j=1}^n\frac{\mu(j)}{\lambda(j)} (a(1) - 1)

and thus, by telescoping, a(n+1)= 1 + \sum_{k=0}^{n} (a(k+1) -a(k)) = 1 + \left[1+ \sum_{k=1}^{n} \prod_{j=1}^k\frac{\mu(j)}{\lambda(j)}\right] (a(1) - 1) Taking n \to \infty we must have a(n)\to 0 and \sum_{k=1}^\infty \prod_{j=1}^k\frac{\mu(j)}{\lambda(j)} < \infty and we find the solution a(n)=\frac{\sum_{k=n}^\infty \prod_{j=1}^k\frac{\mu(j)}{\lambda(j)}}{1+ \sum_{k=1}^\infty \prod_{j=1}^k\frac{\mu(j)}{\lambda(j)}}

Recurrence

  • To study postive recurrence we simply solve for the stationary distribution \pi A=0. The first equation (for n=0) gives \mu(1) \pi(1) - \lambda(0)\pi(0) =0 \tag{2.3} and \lambda(n-1) \pi(n-1) + \mu(n+1) \pi(n+1) - (\mu(n)+ \lambda(n))\pi(n)=0 \textrm{ for } n\ge 1 or
    \mu(n+1)\pi(n+1)-\lambda(n) \pi(n)=\mu(n)\pi(n)-\lambda(n-1) \pi(n-1) \tag{2.4} Combining Equation 2.3 and Equation 2.4 gives \pi(n) = \prod_{j=1}^n \frac{\lambda(j-1)}{\mu(j)} \pi(0) and thus we have a stationary distribution if and only if \sum_{k=1}^\infty \prod_{j=1}^k \frac{\lambda(j-1)}{\mu(j)} <\infty\,.

  • The Poisson process are bure birth models are not irreducible and converge to +\infty almost surely.

  • M/M/1-queue: \prod_{j=1}^n \frac{\mu(j)}{\lambda(j)}=\left(\frac{\lambda}{\mu}\right)^n

    • Transient if \mu < \lambda.

    • Recurrent if \mu =\lambda

    • Positive recurrent if \lambda < \mu and with (geometric) stationary distribution: \pi(n)= \frac{\left(\frac{\lambda}{\mu}\right)^n}{1 -\frac{\lambda}{\mu} }

  • M/M/k-queue: \prod_{j=1}^n \frac{\mu(j)}{\lambda(j)} = \left\{ \begin{array}{cl} n! \left( \frac{\mu}{\lambda} \right)^n & n < k \\ \frac{k!}{k^k} \left(\frac{k\mu}{\lambda}\right)^n & n \ge k \end{array} \right. .

    • Transient if k \mu < \lambda.

    • Recurrent if k \mu =\lambda

    • Positive recurrent if \lambda < k \mu, the stationary \pi is a bit messy to write.

  • M/M/\infty-queue: \prod_{j=1}^n \frac{\mu(j)}{\lambda(j)}= n! \left( \frac{\mu}{\lambda} \right)^n.

    • Always positive recurrent with Poisson stationary distribution \pi(n) = e^{- \frac{\lambda}{\mu}} \frac{ \left(\frac{\lambda}{\mu}\right)^n}{n!}

3 Brownian motion

Definition of Brownian motion

  • The Brownian motion X_t, t\ge 0, is a stochastic process taking value in \mathbb{R} (or \mathbb{R}^d ) and a model for random continuous motion taking place in space.

  • By convention we set X_0=0 and start at 0. Saying that the motion is completely random does not mean that X_t and X_s are independent but rather that the motion after time s, that is X_t-X_s, should be idependent of X_s. More generally we will assume the property of independent increments like we did for the Poisson process N_t. We will also assume that the moition does not have any prefered disrection so we must have E[X_t]=0. In addition we will require that the map t \to X_t are continuous. If the paths are not continuous then N_t-t satisfies thses requirements. But requirement of continuity of the paths actually imply that the only possibility is for the increments to be normally distributed.

  • Formally a Brownian motion or Wiener process with variance parameter \sigma^2 is a stochastic process X_t such that

    1. X_0=0.
    2. Independent increments: For any n>1 and times s_1\le t_1\le s_2 \le t_2 \le \cdots \le s_n \le t_n the random variables X_{t_1}-X_{s_1}, X_{t_2}-X_{s_2} \cdots X_{t_N}-X_{s_n} are independent.
    3. For any s< t the random variables X_t-X_s has a normal distribution with mena 0 and variance \sigma^2(t-s).
    4. The paths are continuous.

    As mentioned before, 3. actually follows from the other assumptions.

Brownian motion as limit of random walks

Let Y_1, Y_2, \cdots be IID random variable with E[Y_n]=0 and \textrm{ Var}(Y_n)=1. Then a random walk is defined by S_0=0, \quad S_n=Y_1 + \cdots + Y_n\,. We now define a process X^{(n)}_t by rescaling time and the size of the increments to obtain a proper limit X^{(n)}_t = \frac{S_{\lfloor n t \rfloor}}{\sqrt{n}} \quad \textrm{ for }t\ge 0 To understand this scaling note that if, say, t=1 we have X^{(n)}_1=\frac{Y_1+ \cdots + Y_n}{\sqrt{n}} which has mean 0 and variance 1 which is independent of n and we can hope to have a well-defined limit. Using now the central limit theorem we find X^{(n)}_t = \frac{S_{\lfloor n t \rfloor}}{\sqrt{n}} = \frac{S_{\lfloor n t \rfloor}}{\sqrt{\lfloor n t \rfloor}} \times \frac{\sqrt{\lfloor n t \rfloor}}{\sqrt{n}} \to t Z \quad \textrm{ in distribution} which has same distribution as X_t. Similarly X^{(n)}_t-X^{(n)}_s = \frac{S_{\lfloor n t \rfloor} -S_{\lfloor n s \rfloor} }{\sqrt{n}} = \frac{1}{\sqrt{n}} \sum_{j=\lfloor n s \rfloor +1}^{\lfloor n t \rfloor} X_j \to (t-s) Z \quad \textrm{ in distribution} which has the same distribution as X_t-X_s. Stronger form of convergences can be proved which shows that the path of the process are actually continuous in the limit but this is too technical for now.

Properties of Brownian motion

  • Probability distributions and the Markov property: By definition X_t is a time-homogeneous process since X_{t+s}-X_s has the same distribution as X_t and the Markov property follows from X_t=X_s+ (X_t- X_s) which shows that the distribution of X_t only depend on X_s. We have then P\{ X_{t+s} \in A | X_s=x\} = P\{ X_t \in A - x \} = \int_{A-x} \frac{1}{\sqrt{2\pi t}} e^{-\frac{y^2}{2t}} dy = \int_{A} \frac{1}{\sqrt{2\pi t}} e^{-\frac{(y-x)^2}{2t}} dy and so X_t is a Markov chain with transition probability densities p_t(x,y)= \frac{1}{\sqrt{2 \pi t}} e^{-\frac{(y-x)^2}{2t}}

  • Reflection and Scaling:
    X_t \textrm{ is a Brownian motion } \implies \left\{ \begin{array}{l} Y_t = - X_t \textrm{ is a Brownian motion } \\ Z_t = \sqrt{a} X_{t/a} \textrm{ is a Brownian motion for any } a>0 \\ \end{array} \right. To see this one check that properties 1.-4. are satified in Definition of Brownian motion

  • Strong Markov property:

    • By the property of idependent increments (which leads immediately to the Markov property) if X_t is a Brownian motion then Y_t = X_{t+s}- X_s is a Brownian motion.

    • A stopping time T for a Brownian motion X_t is random variable such that P\{ T \le t\} \textrm{ is measurable with respect to } \{X_s\}_{0\le s \le t} that is we now when to stop based on past information only.

    • The [strong Markov property] is the following theorem which is believable but would require a more formal proof.

    Theorem 3.1 If X_t is a Brownian motion and T a stopping time for the Brownian motion X_t then X_{T+t} - X_T is Brownian motion.

  • Gaussian process: X_t is a Gaussian process in the sense that the joint distribution of X_{t_1}, X_{t_2}, \cdots, X_{t_n} is a n dimensional normal random variable. To see this note that X_{t_1}, X_{t_2}-X_{t_1}, \cdots, X_{t_n}-X_{t_1} is a n dimensional normal random variable (the components are independent) and that the two vector are related though a linear transformation.
    To compute the covaraince is enough to compute E[X_s X_t] and we have \textrm{ For } s<t \quad E[X_s X_t]=E[X_s (X_s+ X_t-X_s)]= E[X_s^2] + E[X_s] E[X_t-X_s] = s and thus we find E[X_sX_t]= \min \{s,t\}

  • Reflection principle:

    • Fix a value a>0 and consider the following hitting time T_a = \inf \{ t\ge 0, X_t=a\} that is T_a is the first time the Brownian motion exceeds the value a.
      The following results shows that you can reflect the Brownian motion around the line x=a after T_a and still obtain a Brownian motion

Theorem 3.2 (Reflection principle) The process Y_t = \left\{\begin{array}{cl} X_t & t \le T_a \\ 2a - X_t & t > T_a \end{array} \right. is a Brownian motion

Proof. The proof relies on the strong Markov property and the relfectyion property to tells us that X_{T_a+s} - X_{T_a} = X_{T_a+s} - a \textrm{ and } a - X_{T_a+s} are Brownian motions (starting at 0) and so 2a - X_{T_a+s} is a Brownian motion starting at a. Therefore if we set t=T_a+s for t>T_a we see that Y_t is Brownian motion

An illustration of the theorem is in the following figure

Reflection principle

Using the reflection principle we can analyse the distribution of the maximum of the Brownian motion over the time interval [0,t] M_t = \max_{0\le s \le t} X_s

Theorem 3.3 (Distribution of the maximum of Brownian motion) We have P\{ M_t \ge a \} = 2 P\{X_t \ge a\}\,. \tag{3.1} The pdf of M_t is given by f_{M_t}(a)= \sqrt{\frac{2}{\pi t}} e^{-\frac{a^2}{t}} \quad \textrm{ for } a >0 The pdf the first hitting time T_a is given by f_{T_a}(t)= \frac{a}{\sqrt{2 \pi}} t^{-3/2} e^{-\frac{a^2}{2t}} \quad \textrm{ for } t >0

Proof (Theorem 3.3). The proof of Equation 3.1 follows from the reflection principle. P\{ M_t \ge a \} = P\{ M_t \ge a , X_t\le a\} + P\{ M_t \ge a , X_t > a\} For the second term we note that if X_t >a then M_t\ge a and thus P\{ M_t \ge a , X_t > a\}=P(X_t>a).

For the first term we use the reflection principle: the reflected path Y_t is a Brownian motion and its first hitting time to a, say S_a is identical to T_a (see the picture). Therefore \begin{aligned} P\{ M_t \ge a , X_t\le a\} &= P\{ T_a\le t , X_t\le a\} = P\{ S_a \le t , Y_t\le a\} \\ &= P\{ T_a \le t , 2a - X_t\le a\} = P\{ T_a \le t , X_t \ge a\} =P\{ M_t \ge a , X_t \ge a\} \end{aligned} which is identical to the first term. This concludes the proof of Equation 3.1 and the density of M_t is obteined by differentiation.

To obtain the density of T_a we have P\{T_a \le t\} = P\{M_t > a\} =2 P\{ X_t > a\} = 2 P\left\{ \frac{X_t}{\sqrt{t}} > \frac{a}{\sqrt{t}} \right\} = 2 \int_{\frac{a}{\sqrt{t}}}^\infty \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}} dx and differentiating with respect to a proves the result.

Brownian motion in higher dimensions

The Brownian motion in dimension d is defined by X_t = ( X^{(1)}_t, \cdots, X^{(d)}_t) where the X^{(i)}_t are independent one-dimensional Brownian motions.

One deduce from this immediately that

  1. X_0=0.
  2. Independent increments: For any n>1 and times s_1\le t_1\le s_2 \le t_2 \le \cdots \le s_n \le t_n the random variables X_{t_1}-X_{s_1}, X_{t_2}-X_{s_2} \cdots X_{t_N}-X_{s_n} are independent.
  3. For any s< t the random variables X_t-X_s has a normal distribution with mean 0 and covariance (t-s)I.
  4. The paths are continuous.

One can conversely construct the Brownian motion from the properties 1. to 4.

The transition probabilities for Brownian motion are p_t(x,y) = \frac{1}{ (2 \pi t)^{d/2}} e^{-\frac{||y-x\|^2}{2t}} and we have the Chapman-Kolmogorov equation p_{t+s}(x,y) = \int_{\mathbf{R}^d} p_t(x,z) p_s(z,y) dz.

Brownian motion and heat equation

If \mu_0=\rho_o(x) dx is the initial distribution of Brownian particles X_o. Then at time t the distribution of X_t is given by \mu_t=\rho_t(x) dx where \rho_t(y) = \int_{\mathbf{R}^d} \rho_0(x) p_{t}(x,y) \, dy Conversely if we consider a function f: \mathbf{R}^d \to \mathbf{R} we have E[ f(X_t) |X_0=x] = \int_{\mathbf{R}^d} p_{t}(x,y) f(y) \, dy where we should pick f such that the expectation of the right is well defined. Note that since p_t(x,y)=p_t(y,x) is symmetric these two equation are actually identical.

We now derive a differential equation for the function f(t,x)\equiv E[ f(X_t) |X_0=x]. We could simply differentiate, under the integral, the semigroup with respect to t, but instead we provide a more interesting probabilistic proof.

Using a taylor expansion (in d=1 for simplicity) we obtain for functions f which are nice enough (for example C^2, bounded, and with bounded derivatives). f(X_{t+h}) - f(X_t) = f'(X_t) (X_{t+h} - X_t) + \frac{1}{2} f''(X_t) (X_{t+h} - X_t)^2 + o((X_{t+h} - X_t)^2). Since the increments are independent for example f'(X_t) and (X_{t+h} - X_t) are independent and so we obtain \begin{aligned} \frac{\partial f}{\partial t}(x,t) & = \lim_{h \to 0} E\left[ \frac{1}{h}(f(X_{t+h}) - f(X_t)) \right] \\ & = \lim_{h \to 0} \frac{1}{h} E[f'(X_t)] E[(X_{t+h} - X_t)] + \frac{1}{h} \frac{1}{2} E[f''(X_t)] E[(X_{t+h} - X_t)^2] + \frac{1}{h} o((X_{t+h} - X_t)^2)) \end{aligned} Since E[(X_{t+h} - X_t)]=0 and E[(X_{t+h} - X_t)^2]=\textrm{Var}(X_h)=h we obtain \frac{\partial f}{\partial t}(x,t) = \frac{\partial^2 f}{\partial x^2}(x,t) and by a similar argument, in dimension d we obtain the heat equation \frac{\partial f}{\partial t}(x,t) = \frac{1}{2}\Delta f (x,t) \quad \quad \Delta = \sum_{i=1}^d \frac{\partial}{\partial x^{(i)}} \frac{\partial}{\partial x^{(i)}}

Heat equation in domain

Given a domain B with boundary \partial B let us consider the boundary/initial value problem \begin{aligned} & \frac{\partial f}{\partial t}(x,t) = \frac{1}{2}\Delta f (x,t), x \in B \\ & u(t,x)=g(x), x \in \partial B \\ & u(0,x)=f(x), x \in B \\ \end{aligned} We can express the solution in terms of Brownian motion and the hitting time to the boundary \tau= \inf \{t: X_t \in \partial B \} namely u(t,x) = E\left[ f(X_t) 1_{\{\tau >t\}} + g(X_{\tau}) 1_{\{\tau \le t\}}\right]\,. This is derived in a similar way as before. If we take t \to \infty and say B is bounded then the path will eventually hit \partial B and so taking t \to \infty we find the equation \begin{aligned} & \Delta u (x,t)=0, x \in B \\ & u(t,x)=g(x), x \in \partial B \\ \end{aligned} whose solution is E\left[ g(X_{\tau}) 1_{\{\tau \le t\}}\right].

Recurrence and transience of Brownian motion

  • In dimension 1 let us consider the interval [0,R] and the function g on the boundary with g(0)=0 and g(R)=1. Then solution the equation u''(x)=0\, \quad \textrm{ with } \quad u(0)=0, u(1)=1 is u(x)=\frac{x}{R} and so u(x)= P\{ X_t \textrm{ hits } R \textrm{ before hitting } 0| X_0=x\} = \frac{x}{R} Not coincidentally this is the same as the gambler’s ruin (with p=\frac{1}{2}). Therefore P\{ X_t \textrm{ hits } 0| X_0=x \} = \lim_{R \to \infty} P\{ X_t \textrm{ hits } 0 \textrm{ before hitting } R| X_o=x\} = \lim_{R \to \infty} \left(1- \frac{x}{R}\right) =1 and so X_t is recurrent in d=1.

  • In dimension 2 or higher let us consider the annulus domain B= \{ x\in \mathbf{R}^d\,;\, R_1 \le \|x\| \le R_2 \} with 0< R_1 < R_2 < \infty and boundary \partial B =\{ x\in \mathbf{R}^d\,;\, \|x\|=R_1 \textrm{ or } \|x\|=R_2 \}. and let g(y) = \left\{ \begin{array}{cl} 1 & \textrm{ if } \|x\|=R_2 \\ 0 & \textrm{ if } \|x\|=R_1 \end{array} \right.
    Then u(x)=E[ g(x_{\tau})| X_o=x ] is the probability that X_t hits \{\|x\|=R_2\} before hiting the \{\|x\|=R_1\} and u(x) solves \Delta u(x) =0, x \in B \quad \textrm{ and } \quad u(y)=g(y), x \in \partial B\,.

By symmetry the solution is a function of the form u(x)=\phi(\|x\|) since Brownian motion is invariant under rotation: in polar coordinates the heat equation reads \Delta \phi(r)= \phi''(r) + \frac{d-1}{r} \phi'(r) =0 \,. This ODE is easy to solve and has the general solution \phi(r) = \left\{ \begin{array}{cl} c_1 \ln(r) + c_2 & \textrm{ if } d=2 \\ c_1 r^{d-2} + c_2 & \textrm{ if } d\ge 3 \end{array} \right. With the boundary values \phi(R_1)=0 and \phi(R_2)=0 we find \phi(r) = \left\{ \begin{array}{cl} \frac{\ln(r)- \ln(R_1)}{\ln(R_2) - \ln(R_1)} & \textrm{ if } d=2 \\ \frac{ R_1^{2-d} - \|x\|^{2-d}}{R_1^{2-d} - R_2^{2-d}} & \textrm{ if } d\ge 3 \end{array} \right. To study the recurrence we take R_1=\epsilon and R_2 \to \infty to find the probability that a Brownian path hits a small ball around 0 starting from x.

For d=2 since \lim_{R_2 \to \infty} P\left\{ \|X_t\|=R_2 \textrm { before } \|X\|_t=\epsilon|X_0=x\right\} = \lim_{R_2 \to \infty} \frac{\ln(r)- \ln(\epsilon)}{\ln(R_2) - \ln(\epsilon)} = 0 That is X_t returns to the disc \|x\| \le \epsilon with probability 1 for arbitrary \epsilon and it is natural to call X_t recurrent.
Note however that X_t never returns exactly to 0. Indeed we have \lim_{\epsilon \to 0} P\left\{ \|X_t\|=\epsilon \textrm { before } \|X\|_t=R_2| X_0=x \right\} = \lim_{\epsilon \to 0}\left( 1- \frac{\ln(r)- \ln(\epsilon)}{\ln(R_2) - \ln(\epsilon)}\right) = 0

For d\ge 3, arguing similarly we find that
\lim_{R_2 \to \infty} P\{ \|X_t\|=R_2 \textrm { before } \|X\|_t=\epsilon\} = \lim_{R_2 \to \infty} \frac{ \epsilon^{2-d} - \|x\|^{2-d}}{\epsilon^{2-d} - R_2^{2-d}} = 1 - \left(\frac{\epsilon}{\|x\|}\right)^{d-2} < 1 and so the Brownian motion does not return near zero with non-zero probbaility: X_t is transient in dimension 3 or more.

Brownian motion with drift

For a vector \mu \in \mathbb{R}^d let us consider the Brownian motion with drift \mu is given by Y_t = X_t + t\mu It is not too difficult to see that the following properties hold

  • Y_t has independent increments and Y_t-Y_s (s\le t) has a normal distribution with mean \mu(t-s) and covariance \sigma^2(t-s)I.

  • The transition probabilities are p_t(x,y)= \frac{1}{(2\pi \sigma^2 t)^{d/2}} e^{- \frac{\|y-x-t\mu\|^2}{2\sigma^2 t}}.

  • The function f(t,x)=E\left[ f(Y_t)\right] satisfies the partial differential equation \frac{\partial f}{\partial t} = \frac{1}{2}\Delta f + \mu \nabla f

References