Math 606, Spring 2023
University of Massachusetts Amherst
2023-06-25
Counting processes N_t are continuous time stochastic process which counts the number of events occuring up to time t: N_t = \textrm{total number of events occuring up to time } t e.g. the number of accidents reported to an insurance company, the number of earthquakes in California, and so on.
The Poisson process is the simplest counting process, intuitvely characterized by the following properties:
More formally a Poisson process N_t with rate paratmeter \lambda is a continuous time stochastic process such that
Independent increments: Given times s_1\le t_1 \le s_2 \le t_2 \cdots \le s_n \le t_n the random variables X_{t_i}-X_{s_i} (that is the number of events occurring in the disjoint time intervals [s_i,t_i]) are independent.
The rate at which events occur is denoted by \lambda>0 and for any (small) time interval [t, t + \Delta t] we have
\begin{aligned}
P \{ N_{t + \Delta t } = N_t \} &= 1- \lambda \Delta t +o(\Delta t) & \\
P\{N_{t + \Delta t}= N_t +1\} &= \lambda \Delta t +o(\Delta t) &
\textrm{ with } \lim_{\Delta t \to 0}\frac{o(\Delta t)}{\Delta t}=0 \\
P\{N_{t + \Delta t}\ge N_t +2\} & = o(\Delta t) &
\end{aligned}
\tag{1.1}
Theorem 1.1 (Distribution of the Poisson process) The Poisson process N_t with N_0=0 has the distribution P\{N_t=k\} = e^{-\lambda t}\frac{(\lambda t)^k}{k!} i.e. N_t has Poisson distribution with paramter \lambda t. Morever for s, M_t=N_{t+s}-N_{s} is a Poisson process.
Proof (version 1 using Poisson limit). Pick a large number n and divide the interval [0,t] into n intervals of size \frac{t}{n}. Write
N_t = \sum_{j=1}^n\left( N_{j \frac{t}{n}}- N_{(j-1) \frac{t}{n}} \right)
as a sum of n independent random variables. If n is large the probability that any of these random variables is at least 2 is small. Indeed, by a union bound, we have
\begin{aligned}
P\left\{ N_{j \frac{st}{n}}- N_{(j-1) \frac{t}{n}} \ge 2 \textrm { for some } j \right\} &\le \sum_{j=1}^n P\left\{ N_{j \frac{t}{n}}- N_{(j-1) \frac{t}{n}} \ge 2 \right\}
\le n P \left\{ N_{\frac{t}{n}} \ge 2\right\} = t \frac{o( \frac{t}{n})}{\frac{t}{n}}
\end{aligned}
which goes to 0 as n \to \infty.
Therefore N_t is, approximately, a binomial random variables with success probability \lambda \frac{t}{n}: P(N_t=k) \approx {n \choose k} \left(\frac{\lambda t}{n} \right)^k \left(1-\frac{\lambda t}{n} \right)^{n-k} and as n \to \infty this converges to a Poisson distribution with parameter \lambda t. \quad \blacksquare.
Proof (version 2 using ODEs). Let us derive a system of ODEs for the family P_t(k)=P\left\{ N_t=k \right\}. We have \frac{d}{dt}P_t(k) = \lim_{\Delta t \to 0} \frac{P\left\{ N_{t+\Delta t}=k \right\} - P\left\{ N_t=k \right\} }{\Delta t} Conditioning we find \begin{aligned} P\left\{ N_{t+\Delta t}=k \right\}=& P\left\{ N_{t+\Delta t}=k | N_t=k \right\} P\left\{ X_{t}=k \right\} \\ & + P\left\{ N_{t+\Delta t}=k | N_t=k-1 \right\}P\left\{ X_{t}={k-1} \right\} \\ & + P\left\{ N_{t+\Delta t}=k | N_t\le k-2\right\}P\left\{X_t \le k-2\right\} \\ =& P_t(k) (1 - \lambda \Delta t) + P_t(k-1) \Delta t + o(\Delta t) \end{aligned} and this gives the system of equations \begin{aligned} \frac{d}{dt}P_t(0)&= - \lambda P_t(0) & \quad P_0(0)=1 \\ \frac{d}{dt}P_t(k)&= \lambda P_t(k-1) - \lambda P_t(k) &\quad P_0(k)=0 \end{aligned}
We find P_0(t)= e^{-\lambda t} for k=0. We use an integrating factor and set f_t(k)=e^{\lambda t} P_t(k). We have then f_t(0)=1 and for k >0
\frac{d}{dt}f_t(k)=\lambda f_t(k-1), \quad f_0(k)=0
which we can solve iteratively to find
f_t(k) = \frac{(\lambda t)^k}{k!}
and thus N_t has distribution
P_t(k)= e^{-\lambda t}\frac{(\lambda t)^k}{k!}\,.
\blacksquare.
Theorem 1.2 If N_t is a Poisson process with parameter \lambda the interarrival times are independent exponential random variables with paramter \lambda.
Proof. If T_1 >t it means no event has occured up time t and so N_t=0. Therefore P\left\{ T_1 >t \right\} = P\left\{ N_t =0\right\} = e^{-\lambda t} and thus T_1 has an exponential distribution. For T_2 we condition on T_1 P\left\{ T_2 >t \right\} = \int P(T_2>t |T_1 =s) f_{T_1}(s) ds and, using the independence of the increments, P\left\{ T_2>t |T_1 =s\right\} = P\left\{ 0 \textrm{ events } in (s, s+t] | T_1=s \right\} = P\left\{ 0 \textrm{ events } in (s, s+t] \right\} = e^{-\lambda t} from which we conclude that T_2 has exponential distribution and is independent of T_1. This argument can be repeated for T_3 by conditioning on the time of the second event, T_1+T_2, and so on. \quad \blacksquare
Another set of closely related quantities are the arrival times of the n^{th} event S_1, S_2, \cdots which are related to the interarrival times by
S_1=T_1, \quad S_2=T_1+T_2, \quad S_3=T_1+T_2+T_3, \cdots
By Theorem 1.2 S_n is the sum of n independent exponential RVs and thus S_n a Gamma RV with parameter (n,\lambda) with density
f_{S_n}(t) = \lambda e^{-\lambda t}\frac{(\lambda t)^{n-1}}{(n-1)!}\, \quad \textrm{ for }t \ge 0\,.
We can actually prove this fact using the Poisson process by noting that, by definition,
N_t \ge n \iff S_n \le t \,,
that is if n or more events have occured by time t if and only the n^{th} event has occured prior to or at time t.
So the CDF of S_n is F_{S_n}(t)=P\{ S_n \le t\} = P \{ N_t \ge n\} = \sum_{j=n}^\infty e^{-\lambda t} \frac{(\lambda t)^{j}}{j!} and upon differentiating we find f_{S_n}(t)= - \lambda e^{-\lambda t} \sum_{j=n}^\infty \frac{(\lambda t)^{j}}{j!} + \lambda e^{-\lambda t}\sum_{j=n}^\infty \frac{(\lambda t)^{j-1}}{(j-1)!} = \lambda e^{-\lambda t} \frac{(\lambda t)^{n-1}}{(n-1)!}. \quad \blacksquare
Let us start with a special case and assume assume that N_t=1, that is exactly one event has occured in [0,t]. Since the Poisson process has independent increments it seems reasonable the event may have occur with equal probability at any time on [0,t]. Indeed for s\le t we have, using the independence of increments. \begin{aligned} P\{T_1 \le s | N_t=1\} & = \frac{P\{T_1 \le s , N_t=1\}}{P\{N_t=1\}} = \frac{P\{\textrm{1 event in }(0,s] \textrm{ and no event in (s,t]} \}}{P\{N_t=1\}} \\ &= \frac{ \lambda s e^{-\lambda s} e^{-\lambda(t-s)} }{\lambda t e^{-\lambda t}} = \frac{s}{t} \end{aligned} and thus the density of the arrival time T_1 conditioned on N_t=1 is uniform on [0,t]
We study further the properties of the arrival times S_1< S_2 <\cdots of a Poisson process. The following result tells us that they follow a uniform distribution on [0,t].
Theorem 1.3 Given the event \{N_t=n\}, the n arrival times S_1, S_2, \cdots have the same distribution as the order statistics for n independent random variables uniformly distributed on [0,t].
Proof. The conditional density of (S_1, \cdots, S_n) given that N_t=n can be obtained as follows. If S_1=s_1, S_2=s_2, \cdots, S_n=s_n and N_t=n then the intearrival times must satisfy T_1=s_1, T_2=s_2-s_1, \cdots, T_n= s_n-s_{n-1}, T_{n+1}> t-s_n.
By the independence of the interarrival times proved in Theorem 1.2 the conditional density is given by \begin{aligned} f(s_1, s_2, \cdots, s_n|n) &= \frac{f(s_1, s_2, \cdots, s_n, n)}{P\{N_t=nxw\}} \\ &= \frac{\lambda e^{-\lambda s_1} \lambda e^{-\lambda(s_2-s_2)} \cdots \lambda e^{-\lambda(s_n-s_{n-1})} e^{-\lambda(t-s_n)} }{e^{-\lambda t} \frac{(\lambda t)^n}{n!} } = \frac{n!}{t^n} \end{aligned} which is the joint density of the order statistic of n uniform. \blacksquare
The characterization of the Poisson process in terms of exponential random variables suggest immediately a very simple algorithm to simulate N_t.
Simulate independent exponential RVs T_1, T_2, \cdots with parameter \lambda and set N_t=0 for 0\le t <T_1, N_t=1 for T_1\le t <T_1+T_2, and so on.
It is not easy to simulate directly a Poisson random variable X from its pdf/cdf but we can do it elegantly using its relation with exponential random variable. To do this generate independent exponential random variable until they sum up to 1 (so as to generate X=N_1) and use the relation between exponential and uniform.
\begin{aligned}
X =n & \iff \sum_{k=1}^n T_k \le 1 < \sum_{k=1}^{n+1} T_k
\iff \sum_{k=1}^n - \frac{1}{\lambda} \ln(U_k) \le 1 < \sum_{k=1}^{n+1} - \frac{1}{\lambda} \ln(U_k) \\
& \iff \ln( \prod_{k=1}^n U_k) \ge - \lambda > \ln( \prod_{k=1}^{n+1} U_k)
\iff \prod_{k=1}^n U_k \ge e^{-\lambda} > \prod_{k=1}^{n+1} U_k
\end{aligned}
Algorithm to simulate a Poisson random variable with parameter \lambda: Generate random numbers until their product is smaller than e^{-\lambda}.
We investigate the behavior of N_t for large time. We prove a CLT type result, namely that
\lim_{t \to \infty} \frac{N_t-\lambda t}{\sqrt{\lambda t}} = Z \textrm{ in distribution}
where Z is a standard normal RV.
Recall that the characteristic function of a Poisson RV Y with parameter \mu is E[e^{i\alpha Y}]= e^ {\mu (e^{i\alpha} -1)}. Therefore E\left[ e^{i \alpha \frac{N_t - \lambda t}{\sqrt{\lambda t}}} \right] = e^{\lambda t \left( e^{i \frac{\alpha}{\sqrt{\lambda t}}} -i \frac{\alpha}{\sqrt{\lambda t}} -1\right)} Expanding the exponential we have \lambda t \left(e^{i \frac{\alpha}{\sqrt{\lambda t}}} -i \frac{\alpha}{\sqrt{\lambda t}} -1\right) = - \frac{\alpha^2}{2} + O( \frac{1}{\sqrt{\lambda t}}) and thus \lim_{t \to \infty} E\left[ e^{i \alpha \frac{N_t - \lambda t}{\sqrt{\lambda t}}} \right]=e^{-\frac{\alpha^2}{2}}.
The same computation shows also that, for any fixed t, \lim_{\lambda \to \infty} \frac{N_t-\lambda t}{\sqrt{\lambda t}} = Z \textrm{ in distribution} since rescaling the parameter is equivalent to rescaling time.
Theorem 1.4 Suppose N_t is a Poisson process with paramter \lambda and that every event (independently) is either of type 1 with probability 1 or type 2 with probability q=1-p. Then N^{(1)}_t, the number of events of type 1 up to time t, and N^{(2)}_t, the number of events of type 2 up to time t, are independent Poisson process with rate \lambda p and \lambda(1-p).
Proof. We check that N^{(1)}_t satisfy the definition of a Poisson process and the use Theorem 1.1
N^{(1)}_0=0 and N^{(1)}_t has independent increments since N_t has independent increments and events are classified of type 1 and 2 independently of each other.
We have
\begin{aligned}
P \left\{ N^{(1)}_{t + \Delta t}= N^{(1)}_t +1 \right\} & =
P\{ N_{t + \Delta t}= N_t +1 \textrm{ and the event is of type } 1 \} \\
& + P\{ N_{t + \Delta t} \ge N_t +2 \textrm{ and exactly one event is of type } 1 \} \\
&= \lambda \Delta t \times p + o(\Delta t)
\end{aligned}
\begin{aligned} P \left\{ N^{(1)}_{t + \Delta t}= N^{(1)}_t \right\} & = P\{ N_{t + \Delta t}= N_t \} + P\{ N_{t + \Delta t} = N_t + 1 \textrm{ and the event is of type 2} \} \\ & + P\{ N_{t + \Delta t} \ge N_t + 2 \textrm{ and no event of type $1$} \} \\ &= (1- \lambda \Delta t) + \lambda \Delta t (1-p) + o(\Delta t) =1-\lambda \Delta t p + o(\Delta t) \end{aligned} P \left\{ N^{(1)}_{t + \Delta t}\ge N^{(1)}_t +2 \right\} =o(\Delta t)
Finally to show that N^{(1)}_t and N^{(2)}_t are independent we compute their joint PDF by conditioning on the value of N_t and find \begin{aligned} P\left\{N^{(1)}_t =n , N^{(2)}_t =m \right\} &= P\left\{N^{(1)}_t =n , N^{(2)}_t =m | N_t={n+m}\right\} P\left\{N_t={n+m}\right\} \\ &= {n+m \choose n} p^n (1-p)^m e^{-\lambda t} \frac{(\lambda t)^{n+m}}{(n+m)!} \\ &= e^{-\lambda p t} \frac{ (\lambda p t )^n}{n!} e^{-\lambda (1-p) t} \frac{(\lambda(1-p) t)^{m}}{m!} \end{aligned} \blacksquare
We revisit the couplon collector but we relax the assumption that all the toys are equally probable. We assume that any box contains toy i with probability p_i. How do we compute now the expected number of boxes needed to collect all the M toys? The argument used earlier does not generalize easily.
We use the following trick or radomizing the time between boxes. Instead of collecting boxes at fixed time interval, we collect them at times which are exponentially distributed with paramter 1. Then the number of boxes collected up to time t a Poisson process N_t with rate \lambda=1 (on average it takes the same time to get a new box). We have now M types of events (getting a box with toy i) and we split the poisson process accordingly. Then by Theorem 1.4 the number of toys of type i collected up to time t, N^{(i)}_t is a Poisson process with rate \lambda p_i=p_i and the Poisson processes N^{(i)}_t are independent.
We now consider the times T^{(i)} = \textrm{ time of the first event for the process } N^{(i)}_t that is the time where the first toy of type i is collected. The times T^{(i)} are independent since the underlying Poisson processes are independent, and are exponential with paramter p_i. Furthermore S = \max_{i} T^{(i)} = \textrm{ time until one toy of each type has been collected}. By independence we have P\left\{ S \le t \right\} = P\left\{ \max_{i} T^{(i)} \le t\right\} = \prod_{i=1}^M P\left\{ T^{(i)} \le t \right\} = \prod_{i=1}^M (1- e^{-p_it})
Thus E[S]= \int_0^\infty P\{S \ge t\} dt = \int_0^\infty ( 1- \prod_{i=1}^M (1- e^{-p_it}) )\, dt
We can generalize the Poisson process by making the rate \lambda(t) at which event occur depend on time: a nonhomogeneous Poisson process N_t with rate paratmeter \lambda(t) is a continuous time stochastic process such that
Independent increments: Given times s_1\le t_1 \le s_2 \le t_2 \cdots \le s_n \le t_n the random variables N_{t_i}-N_{s_i} (that is the number of events occurring in the disjoint time intervals [s_i,t_i]) are independent.
We have \begin{aligned} P \{ N_{t + \Delta t } = N_t \} &= 1- \lambda(t) \Delta t +o(\Delta t) & \\ P\{N_{t + \Delta t}= N_t +1\} &= \lambda(t) \Delta t +o(\Delta t) & \textrm{ with } \lim_{\Delta t \to 0}\frac{o(\Delta t)}{\Delta t}=0 \\ P\{N_{t + \Delta t}\ge N_t +2\} & = o(\Delta t) & \end{aligned} \tag{1.2}
One way to construct a nonhomgeneous Poisson process is by sampling it in a time-dependent manner. Suppose \lambda(t) is bounded (locally in t), then we pick \lambda > \lambda(t). We consider a Poisson process M_t with constant rate \lambda, and if an event occurs at time t then we decide to keep this event with probability p(t)=\frac{\lambda(t)}{\lambda} and we discard the event with probability 1-p(t). By the same argument we used in the section Sampling a Poisson process we see the number of kept events satisfies the definition of a non-homogeneous Poisson process in Equation 1.2
Let us consider an event for the process M_t which occured in the interval [0,t]. By our analysis of arrival time we know that this event occured a time which is uniformly distributed on the interval [0,t]. Therefore the probability that this event was accepted and contribute to N_t is therefore p_t = \frac{1}{t}\int_0^t \frac{\lambda(s)}{\lambda} \, ds
By repeating then the second part of the arguement in the section Sampling a Poisson process we see that M_t has a Poisson distribution with parameter \lambda t p_t = \int_0^t \lambda(s) \, ds and in particular E[N_t] = \int_0^t \lambda(s) \, ds
Assume that the the flow of customers entering an onine store follows a Poisson process N_t with rate \lambda. The time S spent in the store for a single customer (browsing around, checking out, etc..) is given by tis CDF G(t)=P\{ S \le t\} and we assume that the customers are independent of each other.
To figure out how to allocate ressources one wants to figure out what is number of customers, M_t, which are still in the sytem at time t.
To find the distribution of M_t let us consider one of the customer by time t. If he arrived at time s\le t then he will have left the system at time t with probability G(t-s) and will still be in the system by time t with probability 1-G(t-s). Since the arrival time of that customer is uniform on [0,t] the distribution of M_t is Poisson with mean
E[M_t] = \int_0^t (1- G(t-s))ds = \lambda \int_0^t (1 - G(s))ds,.
For large t, we see that E[M_t] \approx \lambda E[S].
Suppose that the number of claims receieved by an insurance follows a Poisson process. The size of each claim will be different and it natural to assume that claims are independent from each other. If we look at the total claims incurred by the insurance company this leads to a stochastic process called a compound Poisson process.
A stochastic process X_t is called a compound Poisson process if it has the form X_t = \sum_{k=1}^{N_t} Y_k where N_t is a Poisson process and Y_1, Y_2,\cdots are IID random variables which are also independent of N_t.
The process X_t has stationary independent increments. Using that N_t-N_s is a Poisson process X_t-X_s = \sum_{k=N_s}^{N_t} Y_k \textrm{ has the same distribution as } X_{t-s}= \sum_{0}^{N_{t-s}} Y_k
We can compute the MGF of X_t (or its charactersitic function) by conditining on N_t. Suppose m_Y(\alpha)=E[e^{\alpha Y}] is the moment generating function of Y and using the MGF for the Poisson RV we find \begin{aligned} m_{X_t}(\alpha)& =E\left[ e^{\alpha X_t}\right] = E\left[ e^{\alpha \sum_{k=1}^{N_t} Y_k} \right] = \sum_{n=0}^\infty E\left[ e^{\alpha \sum_{k=1}^{N_t} Y_k} | N_t =n \right] P\{n_t=n\} \\ & = \sum_{n=0}^\infty m(\alpha)^n P\{n_t=n\} = e^{\lambda t ( m(\alpha) -1 )} \end{aligned}
We can compute then the mean and variance \begin{aligned} m_{X_t}'(\alpha)&= e^{\lambda t ( m(\alpha) -1 )} \lambda t m'(\alpha) \\ m_{X_t}''(\alpha)&= e^{\lambda t ( m(\alpha) -1 )} (\lambda t m''(\alpha) + \lambda t)^2() m'(\alpha)^2 ) \end{aligned} and thus E[X_t]= \lambda t E[Y] \quad \textrm{ and } Var[X_t]= \lambda t ({\rm Var}(Y) + E[Y]^2]) With a bit more work we could prove a central limit theorem.
In this section that we build a continuous time Markov process X_t with t\ge 0. The Markov property can be expressed as P\{ X_t=j| \{X_{r}\}, 0\le r \le s \} = P\{X_t=j| X_s \} \,. for any 0 < s < t.
To construct a Markov we will need lots of exponential random variables. Recall that an exponential random variable T with paramter \lambda has the pdf f_T(t)=\lambda e^{-\lambda t}, for t \ge 0 the cdf F_T(t) = 1- e^{-\lambda t} and mean E[T]=\frac{1}{\lambda}.
Proposition 2.1 (Properties of exponential randomm variables) Let T_1, T_2, T_3, \cdots be independent exponential random variables with parameter \lambda_1, \lambda_2, \cdots. Then
Proof. For 1. we have, using independence,
P\left\{ T > t \right\} = P\left\{ T_1 > t, \cdots, T_n > t \right\} = P\left\{ T_1 > t \right\} \cdots P\left\{ T_m > t \right\}
= e^{-(\lambda_1+ \cdots \lambda_n)t}
and thus T is an exponential random variable.
For 2. we have, by conditioning,
P\left\{ T_1 = T \right\} = \int_0^\infty P\{ T_2>t, \cdots, T_n > t \} f_{T_1}(t) \, dt = \int_0^\infty e^{-(\lambda_2 + \cdots+ \lambda_n)t} \lambda_1 e^{-\lambda_1 t}\, dt = \frac{\lambda_1}{\lambda_1+ \cdots \lambda_n}
\blacksquare
If we use a row vector p_t = (p_t(1), p_t(2), \cdots) then we can rewrite Equation 2.1 as the system \frac{d}{dt}p_t = p_t A \tag{2.2}
If S is finite then one can write the solution in terms of the matrix exponential e^{tA}= \sum_{k=0}^\infty \frac{t^k A^k}{k!} p_t = p_0 e^{tA} \,. where p_o(i)=P\{X_0=i\} is the initial distribution.
We can also write equation for the transition probabilities (take X_0=i) P_t(i,j) = P \{ X_t =j| X_0=i\} and we obtain the matrix equation \frac{d}{dt}P_t = P_t A, \quad \textrm{ with } P_0 =I \quad \implies \quad P_t = e^{tA} which we can solve using linear algebra techniques
A = \begin{matrix} 1 \\ 2 \\ 3 \\ 4 \end{matrix} \begin{pmatrix} -1 & 1 & 0 & 0 \\ 1 & -3 & 1 & 1 \\ 0 & 1 & -2 & 1 \\ 0 & 1 & 1 & -2 \end{pmatrix} The (right) eigenvalues of A are 0, -1, -3, -4 and we can diagonalize A and find D=\begin{pmatrix} 0 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & -3 & 0 \\ 0 & 0 & 0 & -4 \\ \end{pmatrix} = Q^{-1} A Q where the rows of Q are the corresponding eiegenvectors: Q=\begin{pmatrix} 1& 1 & 0 & -\frac13 \\ 1& 0 & 0 & 1 \\ 1& -\frac12 & -\frac12 & -\frac13 \\ 1& -\frac12 & \frac12 & \frac13 \\ \end{pmatrix} \quad \quad Q^{-1}=\begin{pmatrix} \frac14 & \frac14 & \frac14 & \frac14 \\ \frac23& 0 & -\frac13 & -\frac13 \\ 0& 0 & -1 & -1 \\ -\frac14& \frac34 & -\frac14 & -\frac14 \\ \end{pmatrix} Then we find P_t = e^{tA} = Q e^{tD} Q^{-1} =
We can also construct the Markov chain X_t using exponential random variable.
If X_t=i consider (independent) random variable T(i,j) with parameter \alpha(i,j) for j \not =i. Think of those as exponential clocks and when the first of the clocks rings, say clock j, then the Markov chain moves to state j.
Using Proposition 2.1 we see that the chain moves from the state i to j if the clocks associated to the pair (i,j) rings first and this occur with probability Q(i,j) = \frac{\alpha(i,j)}{\sum_{j \not= i}\alpha(i,j)} = \frac{\alpha(i,j)}{\alpha(i)}
The matrix Q(i,j) is the transition matrix for a discrete time Markov chain Y_n which has the same junmps probabilities as X_t but the transition occur at each unit time.
Then a new set of clocks with rates \alpha(j,k) is produced and the process starts anew until the Markov chain moves to a new state.
By Proposition 2.1 we can express this slightly differently. If in state i we use a single clock with rate \alpha(i) and when this clock rings then we move to state j according to the discrete time Markov chain Q(i,j).
The Markov property follows form the memoryless property for an exponential distribution T: P(T\ge t +s| T \ge s) = P(T>t).
If X_t=i then by construction the position after the next jump after time t clearly depends only i and not the states that the Markov chain visited before time t. Moreover if we consider the time of the last jump before time t which occured, say at time s<t, then the memoryless property of the exponential random variable implies that the time at which the jump occur after time t does not depend on s at all. Putting these together this implies the Markov property P\{X_{t+u} =j | X_{s}, 0 \le s \le t\} = P\{X_{t+u} =j | X_t \}
To connect this to the previous description we note that by conditioning on the first jump
P_t(i,j) = P(X_t=j|X_0=i) = \delta(i,j) e^{-\alpha(i) t} + \int_0^t \alpha(i) e^{-\alpha(i)s} \sum_{k} {Q(i,k)} P_{t-s}(k,j) ds
Iterating this equation and setting t=\Delta t we find
\begin{aligned}
P(X_{\Delta t}=j|X_0=i) &= \delta(i,j) e^{-\alpha(i) \Delta t} + \int_0^t \alpha(i) e^{-\alpha(i)s} \sum_{k} \frac{\alpha(i,k)}{\alpha(i)} \delta(k,j) e^{-\alpha(k)(t-s)} ds + \cdots \\
&= \delta(i,j) e^{-\alpha(i) \Delta t} + \alpha(i,j) \Delta t \times \underbrace{ e^{-\alpha(j) \Delta t}
\frac{1}{\Delta t} \int_0^{\Delta t} e^{-(\alpha(i)-\alpha(j)) s} ds
}_{\to 1 \textrm{ as }\Delta t \to 0} + \cdots \\
&= \delta(i,j) (1 - \Delta t \alpha(i)) + \alpha(i,j) \Delta t + o(\Delta t)
\end{aligned}
The higher order terms are negligible since the probability to have two jumps in the time interval [0,\Delta t] is o(\Delta t).
and the generator is given by A = (Q-I).
A probability vector \pi is a stationary distribution for the Markov chain with generator A if
\pi P_t = \pi \quad \textrm{ for all } t =0
which, by Komlogorov equation, is equivalent to
\pi A = 0
In terms of the rate \alpha(i,j) we see that \pi is stationary if ansd only if \sum_{i \not= j} \pi(i) \alpha(i,j) = \pi(j)\alpha(j) = \sum_{i \not=j} \pi(j) \alpha(j,i) which we can interpret as a balance equation: \pi(i)\alpha(i,j) is the rate at which the chain in a state \pi changes from i to j.
As for discrete time we say that a Markov chain satisfies detailed balance if \pi(i) \alpha(i,j) = \pi(j) \alpha(j,i) \textrm{ for all } i \not= j and detailed balance obviously implies stationarity.
The Markov chain with generator A is irreducible if for any pair of states i,j we can find states i_1, \cdots, i_{N-1} such that \alpha(i,i_1) \alpha(i_2, i_3) \cdots \alpha(i_{N-1},j) >0 If there exists a stationary distribution for an irreducible chain then \pi(i)>0 for all i. Indeed if \pi(i)>0 and \alpha(i,j)>0 then \pi A(j) =0 implies that \pi(j)\alpha(j) = \sum_{k}\pi(k)\alpha(k,j) \ge \pi(i)\alpha(i,j)>0 and thus \alpha(j)>0.
The issue of periodicity cannot occur for continuous time Markov chains
Lemma 2.1 For an irreducible Markov chain with generator A, P_t(i,j)>0 for all i,j and all t>0.
Proof. Using the Markov property
\begin{aligned}
P\{X_t=j|X_0=i\} &\ge P \{ X_{t/N}=i_i, X_{2t/N}=i_2, \cdots X_{t}=j |X_0=i\} \\
&= P \{ X_{t/N}=i_1| X_0=i\} P\{ X_{2t/N}=i_2| X_{t/N}=i_i\} \cdots P\{ X_{t}=j |X_{t\frac{N-1}{N}}=i_{n-1}\}
\end{aligned}
and for example
P \{ X_{t/N}=i_1| X_0=i\} \ge \int_0^{t/N} \alpha(i) e^{-\alpha(i)s} Q(i,i_1)
e^{-\alpha(i_1)(t-s)}>0
\blacksquare
If the state space is finite and the chain is irreducible then we have convergence to equilibrium.
Theorem 2.1 If the state space S is finite and the Markov chain with generator A is irreducible then for any initial distribution \mu we have \lim_{t \to \infty}\mu P_t = \pi
Proof. We use a spectral argument and the result for discrete time Markov chain. Pick a number a such that a is strictly greater than all entries of A (this is possible since A is finite). Consider the matrix R= \frac{1}{a}A + I. Then R is a stochastic matrix and the Markov chain Y_n with transition matrix R is irreducible and aperiodic (because Q(i,i)>0). Then from ?@thm-convMC we know (details in the exercises) that R has a simple eigenvalue 1 and all other eigenvalues \lambda satisfy |\lambda|<1. Now Qf = \lambda f \iff Af =a(\lambda -1) f and so 0 is a simple eigenvalue for A and the other eiegenvalue are of the form a(\textrm{Re}(\lambda) -1) + i a \textrm{ Im }(\lambda) and so the real part is strictly negative.
The vector 1=(1,1, \cdots,1 )^T is a right eigenvector for A and \pi is a left eigenvector for A. If we define the matrix \Pi as the matrix whose rows are equal to \pi, we have then
(P^t - \Pi) \begin{pmatrix} 1\\ \vdots \\1 \end{pmatrix}= 0 \textrm{ and } \pi(P^t - \Pi) =0\,.
Moreover if f the right eigenvector and g the left eigenvector for A for the eigenvalue \mu \not=0 then we have
\pi (Af) = \mu \pi f = (\pi A) f =0 \quad \textrm{ and } (g A) 1 = g (A1) = \mu g 1 =0
and thus we must have \pi f=0 and g 1 =0. Therefore
P^t - \Pi)f = P^t f = e^{\mu t} f \textrm{ and } g(P^t - \Pi) = gP^t = e^{\mu t} g.
This implies that P^t - \Pi has the simple eigenvalue 0 and the same eigenvalues e^{\mu t} as P^t and \mu has strictly negative real part. Therefore P^t - \Pi converges to 0 as t \to \infty, or
\lim_{t \to \infty} P_t(i,j)= \pi(j) .
To study the transient behavior in continuous time we can use similar ideas as in discrete time.
Absorption probabilities: the absorption probabilities do not depend on the time spent in every state so they can be computed using the transition matrix Q(i,j) for the embedded chain Y_n and the formula in Section [Absorption probabilities]
Expected hittiing time: For example we have the following result
Theorem 2.2 Supose X_t is a Markov chain with generator A and for i\not j let \Sigma(j) = \inf\{t \ge 0 ; X_t=i\} be the first hitting time to state j. Let \tilde{A} the matrix obtained by deleting the j^{th} and j^{th} column from the generator A. Then we have E[\Sigma(j)|X_0=i] = \sum_{l \not= j} B(i,l) \quad \textrm{ where } B= \tilde{A}^{-1} The matrix \tilde{A} has rowsums which are non-positive and at least one of the row must be strictly negative.
Proof. By conditioning on the first jump which happens at time T we have E[\Sigma(j)|X_0=i] = \underbrace{E[T|X_0=i]}_{\textrm{expected time until the first jump }} + \sum_{k\in S, k\not=j} P\{ X_T = k|X_0=i\} \underbrace{E[ \Sigma(j)|X_0=k]}_{\textrm{expected hitting time} \atop \textrm{ from the state after the first jump}} If we set b(i)=E[\Sigma(j)|X_0=i] (for i \not= j) we find the equation b(i)= \frac{1}{\alpha(i)} + \sum_{k \not= j} \frac{\alpha(i,k)}{\alpha(i)} b(k) \implies 1 = \alpha(i) b(i) - \sum_{k\not= j}\alpha(i,k) b(k) which reads, in matrix form as 1 = -\tilde{A} b \implies b= (-\tilde{A})^{-1} 1. To show that -\tilde{A} is invertible we consider the matrix R= \frac{1}{a}\tilde{A} + I where a is chosen larger than all the entries. Then the entries of R are non-negative and the rowsums do not exceed one with at least one strictly less than 1. By the results for discrete time Markov chains we know that I-R=(-\frac{1}{a} A) is invertible.
For a continuous time Markov chain X_t let us consider the time of the successive jumps S_1=T_1, S_2=T_1+T_2, S_3=T_1+T_2+T_3, \cdots Here the T_i are independent exponential but, in general, no identically distributed with paramters \alpha_i which depends on the state being visited. We have then E[S_n]= \sum_{i=1}^n \frac{1}{\lambda_i}
Explosion: To see what can happen consider a Markov chain with rate \alpha(i,i+1)=(n+1)^2 and all other rates equal to 0. Then the Markov chain moves up by 1 at every jump like a Poisson process but at accelerated pace. We have then E[S_\infty]= \sum_{n=1}\frac{1}{n^2} < \infty so S_\infty < \infty with probability 1. So there are infinitely many jumps in finite time and X_t=+\infty after a finite time. This is called explosion.
This is an issue familiar in ODE: the equation \frac{d}{dt} x_t = x_t^2 has solution has solution x_t= \frac{x_0}{x_0-t} which blows up at time t=x_0.
It is not easy to determine if an explosion really occurs. Indeed for no explosion to occur we must have, with probabilty 1, \sum_{n} \frac{1}{\alpha(Y_n)} = \infty where Y_n is the embedded chain.
A sufficient condition for non-explosion is a suitable upper bound on the rates \alpha(i), say \alpha(i) \le \alpha which is true for finite state spaces but this is by no means necessary.
The recurrence and transience for a continuous time Markov chain can be defined as for discrete time. A an irreducbile Markov chain X_t is recurrent if the chain starting from a state i returns to i with probability 1 and transient if X_t never return to i with some non-zero probability.
Transience/recurrence has nothing to do with the actual time spent in every state and so we have immediately X_t \textrm { with rates } \alpha(i,j) \textrm{ is } \begin{array}{l} \textrm{transient} \\ \textrm{recurrent}\end{array} \iff Y_n \textrm{ with transition } Q(i,j)=\frac{\alpha(i,j)}{\alpha(i)} \textrm{ is } \begin{array}{l} \textrm{transient} \\ \textrm{recurrent}\end{array}
For positive recurrence we need to define the first return to a state i. This is most easily defined in terms of the return for the embeded Markov chain Y_n and its first return time \tau(i). The time until X_0 returns to i will the sum of the time spent in each state. If S_n denotes the time of the jumps and \tau(i)=n after visiting the state j_0, j_1, \cdots j_{n-1}, j_n=i then the return time is \sum_{k=0}^{n-1}(S_{k+1}- S_k) where S_{k+1}-S_k is exponential with paramter \alpha(j_k). The first return time to state i for the Markov chain X_t is given \Sigma(i) = \sum_{k=0}^{\tau(i)-1}(S_{k+1}- S_k)
Theorem 2.3 If the Markov chain with rates \alpha(i,j) is irreducible and recurrent, then there exists a unique solution (up to a multiplicative constant) \eta=(\eta(1), \eta(2), \cdots) to the equation \eta A=0 with 0 < \eta(i) < \infty \,. If it holds that \sum_{i}\eta(i) < \infty then \eta can be normalized to a stationary distribution and X_t is positive recurrent.
Proof. The equation \eta A(k)=0 can be written as \sum_{j \not= k} \eta(j) \alpha(j,k) = \alpha(k) \eta(k) \iff \sum_{j \not= k} \eta(j)\alpha(j) Q(j,k) = \eta(j)\alpha(j) That is the row vector \mu with entries \mu(k)=\alpha(k) \eta(k) must satisfy \mu Q = \mu.
If X_t is recurrent then the embedded Markov chain Y_n with transition Q is recurrent and so by ?@thm-stationarydistribution there exists a solution to \mu Q=\mu and therefore we found a solution for \eta A=0.
Moreover, from ?@thm-stationarydistribution we have a the representation \mu(j)=\alpha(j) \eta(j) = E\left[ \sum_{k=0}^{\tau(i)-1} {\bf 1}_{\{Y_k=j\}}\right] where i is some fixed but arbitrary reference state (this counts the number of visits to the state j between two consecutive visits to the reference state i)
If we denote by S_n the time of the n^{th} jump for X_t we have \begin{aligned} \eta(j) &= \sum_{k=0}^\infty E\left[ \frac{1}{\alpha(j)} {\bf 1}_{\{Y_k=j\}} {\bf 1}_{\{\tau(i) >k\}} \right] = \sum_{k=0}^\infty E\left[ (S_{k+1} -S_k) {\bf 1}_{\{Y_k=j\}} {\bf 1}_{\{\tau(i) >k\}} \right] = E\left[ \sum_{k=0}^{\tau(i)-1} (S_{k+1} -S_k) {\bf 1}_{\{Y_k=j\}} \right] \end{aligned} which is nothing but the time spent (by X_t) in the state j between successive visits to i.
If \sum_{j} \eta(j) <\infty then we have E[\Sigma(j)]=E\left[ \sum_{k=0}^{\tau(i)-1} (S_{k+1} -S_k)\right] <\infty which is the expected return time to state i. That is the chain X_t is positive recurrent.
We have the following theorem which is the exact counterpart of the discrete time case (and is proved very similarly so we will omit the proof).
Theorem 2.4 Suppose X_t is irreducible and positive recurrent. Then X_t has a unique stationary distribution, and with probability 1, the time spent in state j, converges to \pi(j) \lim_{t\to \infty}\int_0^t \mathbf{1}_{\{X_s=j\}} \, ds = \pi(j). Moreover we have Kac’s formula: \pi(j) is also equal to the average time between consecutive visits to state j: \pi(j) = \frac{1}{E[\Sigma(j)|X_0=j]}. Conversely if X_t has a stationary distribution then X_t is positive recurrent.
A general birth and death process is a continuous time Markov chain with state space \{0,1,2,\cdots\} and whose only non-zero transition rates are \begin{aligned} &\lambda(n) = \alpha(n,n+1) & = \textrm{ birth rate for a population of size } n & (\textrm{ for } n\ge 0) \\ &\mu(n)=\alpha(n,n-1) & = \textrm{ death rate for a population of size } n & (\textrm{ for } n\ge 1) \end{aligned}
The Kolmogorov equations for the distribution of X_t are, for n\ge 1 \frac{d}{dt}p_t(n) = \underbrace{ \mu(n+1) p_t(n+1)}_{\textrm{ increase due do death} \atop \textrm{in a population of size } n+1} + \underbrace{\lambda_{n-1} p_t(n-1)}_{\textrm{ increase due do birth} \atop \textrm{in a population of size } n-1} - \underbrace{(\lambda(n) + \mu(n))p_t(n)}_{\textrm{ decrease due do birth/death} \atop \textrm{in a population of size} n}. For n=0, the eqation reads \displaystyle \frac{d}{dt}p_t(0)= \mu(1) p_t(1) - \lambda(0) p_t(0).
The generator has the form \begin{matrix} 0 \\ 1 \\ 2\\ \vdots \end{matrix} \begin{pmatrix} -\lambda(0) & \lambda(0) & 0 & 0 & \dots \\ \mu(1) &-\lambda(1) -\mu(1) & \lambda(1) & 0 & \dots \\ 0 & \mu(2) &-\lambda(2) -\mu(2) & \lambda(2) & \dots \\ \vdots & \ddots & \ddots & \ddots & \\ \end{pmatrix}
\begin{matrix} 0 \\ 1 \\ 2\\ \vdots \end{matrix} \begin{pmatrix} 0 & 1 & 0 & 0 & \dots \\ \frac{\mu(1)}{\mu(1)+ \lambda(1)} & 0 & \frac{\lambda(1)}{\mu(1)+ \lambda(1)} & 0 & \dots \\ 0 & \frac{\mu(2)}{\mu(2)+ \lambda(2)} & 0 & \frac{\lambda(2)}{\mu(2)+ \lambda(2)} & \dots \\ \vdots & \ddots & \ddots & \ddots & \\ \end{pmatrix} which is the transition matrix of a general random walk on the non-negative integers.
Poisson process: There is no death and the birth rate is constant \mu(n)=0 \quad \textrm{ and } \quad \lambda(n)=\lambda
Queueing models: Imagine a Poisson process (with rate \lambda) describing the arrival of customers. In the systems there are k service station at which the service time is exponential with rate \mu. If one service station is empty, upon arrival a customer immediately enter a service station and stay in the system until being served. If all stations are currently busy serving other customers, the customer wait until one stay becomes free. These models are called M/M/k queues where M statnds for Markovian and k is the number of queues, the first M describing the arrival process and the second M the service time (which is exponential).
M/M/1 queue: If there is only one service station \lambda(n)=\lambda \quad \textrm{ and } \quad \mu(n) =\mu
M/M/k queue: If there is k service station then the rates are \lambda(n)=\lambda \quad \textrm{ and } \quad \mu(n) = \left\{ \begin{array}{cl} n \mu & n \le k \\ k \mu & n > k \end{array} \right.
M/M/\infty queue: If there is k service station then the rates are \lambda(n)=\lambda \quad \textrm{ and } \quad \mu(n) =n\mu
Population models: If X_t describes the size of a population birth rate will be naturally proportional to the size of the population if we assume that all individuals give birth or die with a certina rate.
Pure birth model: no death occur and so \lambda(n)=n \lambda \quad \textrm{ and } \quad \mu(n) = 0
Population model: the rates \lambda(n)=n \lambda \quad \textrm{ and } \quad \mu(n) = n \mu
Population model with immigration: if immigrants arrive acccording to a Poisson process with rate \nu the rates are \lambda(n)=n \lambda +\nu \quad \textrm{ and } \quad \mu(n) = n \mu
and thus, by telescoping, a(n+1)= 1 + \sum_{k=0}^{n} (a(k+1) -a(k)) = 1 + \left[1+ \sum_{k=1}^{n} \prod_{j=1}^k\frac{\mu(j)}{\lambda(j)}\right] (a(1) - 1) Taking n \to \infty we must have a(n)\to 0 and \sum_{k=1}^\infty \prod_{j=1}^k\frac{\mu(j)}{\lambda(j)} < \infty and we find the solution a(n)=\frac{\sum_{k=n}^\infty \prod_{j=1}^k\frac{\mu(j)}{\lambda(j)}}{1+ \sum_{k=1}^\infty \prod_{j=1}^k\frac{\mu(j)}{\lambda(j)}}
The Poisson process are bure birth models are not irreducible and converge to +\infty almost surely.
M/M/1-queue: \prod_{j=1}^n \frac{\mu(j)}{\lambda(j)}=\left(\frac{\lambda}{\mu}\right)^n
Transient if \mu < \lambda.
Recurrent if \mu =\lambda
Positive recurrent if \lambda < \mu and with (geometric) stationary distribution: \pi(n)= \frac{\left(\frac{\lambda}{\mu}\right)^n}{1 -\frac{\lambda}{\mu} }
M/M/k-queue: \prod_{j=1}^n \frac{\mu(j)}{\lambda(j)} = \left\{ \begin{array}{cl} n! \left( \frac{\mu}{\lambda} \right)^n & n < k \\ \frac{k!}{k^k} \left(\frac{k\mu}{\lambda}\right)^n & n \ge k \end{array} \right. .
Transient if k \mu < \lambda.
Recurrent if k \mu =\lambda
Positive recurrent if \lambda < k \mu, the stationary \pi is a bit messy to write.
M/M/\infty-queue: \prod_{j=1}^n \frac{\mu(j)}{\lambda(j)}= n! \left( \frac{\mu}{\lambda} \right)^n.
The Brownian motion X_t, t\ge 0, is a stochastic process taking value in \mathbb{R} (or \mathbb{R}^d ) and a model for random continuous motion taking place in space.
By convention we set X_0=0 and start at 0. Saying that the motion is completely random does not mean that X_t and X_s are independent but rather that the motion after time s, that is X_t-X_s, should be idependent of X_s. More generally we will assume the property of independent increments like we did for the Poisson process N_t. We will also assume that the moition does not have any prefered disrection so we must have E[X_t]=0. In addition we will require that the map t \to X_t are continuous. If the paths are not continuous then N_t-t satisfies thses requirements. But requirement of continuity of the paths actually imply that the only possibility is for the increments to be normally distributed.
Formally a Brownian motion or Wiener process with variance parameter \sigma^2 is a stochastic process X_t such that
As mentioned before, 3. actually follows from the other assumptions.
Let Y_1, Y_2, \cdots be IID random variable with E[Y_n]=0 and \textrm{ Var}(Y_n)=1. Then a random walk is defined by S_0=0, \quad S_n=Y_1 + \cdots + Y_n\,. We now define a process X^{(n)}_t by rescaling time and the size of the increments to obtain a proper limit X^{(n)}_t = \frac{S_{\lfloor n t \rfloor}}{\sqrt{n}} \quad \textrm{ for }t\ge 0 To understand this scaling note that if, say, t=1 we have X^{(n)}_1=\frac{Y_1+ \cdots + Y_n}{\sqrt{n}} which has mean 0 and variance 1 which is independent of n and we can hope to have a well-defined limit. Using now the central limit theorem we find X^{(n)}_t = \frac{S_{\lfloor n t \rfloor}}{\sqrt{n}} = \frac{S_{\lfloor n t \rfloor}}{\sqrt{\lfloor n t \rfloor}} \times \frac{\sqrt{\lfloor n t \rfloor}}{\sqrt{n}} \to t Z \quad \textrm{ in distribution} which has same distribution as X_t. Similarly X^{(n)}_t-X^{(n)}_s = \frac{S_{\lfloor n t \rfloor} -S_{\lfloor n s \rfloor} }{\sqrt{n}} = \frac{1}{\sqrt{n}} \sum_{j=\lfloor n s \rfloor +1}^{\lfloor n t \rfloor} X_j \to (t-s) Z \quad \textrm{ in distribution} which has the same distribution as X_t-X_s. Stronger form of convergences can be proved which shows that the path of the process are actually continuous in the limit but this is too technical for now.
Probability distributions and the Markov property: By definition X_t is a time-homogeneous process since X_{t+s}-X_s has the same distribution as X_t and the Markov property follows from X_t=X_s+ (X_t- X_s) which shows that the distribution of X_t only depend on X_s. We have then P\{ X_{t+s} \in A | X_s=x\} = P\{ X_t \in A - x \} = \int_{A-x} \frac{1}{\sqrt{2\pi t}} e^{-\frac{y^2}{2t}} dy = \int_{A} \frac{1}{\sqrt{2\pi t}} e^{-\frac{(y-x)^2}{2t}} dy and so X_t is a Markov chain with transition probability densities p_t(x,y)= \frac{1}{\sqrt{2 \pi t}} e^{-\frac{(y-x)^2}{2t}}
Reflection and Scaling:
X_t \textrm{ is a Brownian motion } \implies \left\{ \begin{array}{l} Y_t = - X_t \textrm{ is a Brownian motion } \\
Z_t = \sqrt{a} X_{t/a} \textrm{ is a Brownian motion for any } a>0 \\ \end{array} \right.
To see this one check that properties 1.-4. are satified in Definition of Brownian motion
Strong Markov property:
By the property of idependent increments (which leads immediately to the Markov property) if X_t is a Brownian motion then Y_t = X_{t+s}- X_s is a Brownian motion.
A stopping time T for a Brownian motion X_t is random variable such that P\{ T \le t\} \textrm{ is measurable with respect to } \{X_s\}_{0\le s \le t} that is we now when to stop based on past information only.
The [strong Markov property] is the following theorem which is believable but would require a more formal proof.
Theorem 3.1 If X_t is a Brownian motion and T a stopping time for the Brownian motion X_t then X_{T+t} - X_T is Brownian motion.
Gaussian process: X_t is a Gaussian process in the sense that the joint distribution of X_{t_1}, X_{t_2}, \cdots, X_{t_n} is a n dimensional normal random variable. To see this note that X_{t_1}, X_{t_2}-X_{t_1}, \cdots, X_{t_n}-X_{t_1} is a n dimensional normal random variable (the components are independent) and that the two vector are related though a linear transformation.
To compute the covaraince is enough to compute E[X_s X_t] and we have
\textrm{ For } s<t \quad E[X_s X_t]=E[X_s (X_s+ X_t-X_s)]= E[X_s^2] + E[X_s] E[X_t-X_s] = s
and thus we find
E[X_sX_t]= \min \{s,t\}
Reflection principle:
Theorem 3.2 (Reflection principle) The process Y_t = \left\{\begin{array}{cl} X_t & t \le T_a \\ 2a - X_t & t > T_a \end{array} \right. is a Brownian motion
Proof. The proof relies on the strong Markov property and the relfectyion property to tells us that X_{T_a+s} - X_{T_a} = X_{T_a+s} - a \textrm{ and } a - X_{T_a+s} are Brownian motions (starting at 0) and so 2a - X_{T_a+s} is a Brownian motion starting at a. Therefore if we set t=T_a+s for t>T_a we see that Y_t is Brownian motion
An illustration of the theorem is in the following figure
Using the reflection principle we can analyse the distribution of the maximum of the Brownian motion over the time interval [0,t] M_t = \max_{0\le s \le t} X_s
Theorem 3.3 (Distribution of the maximum of Brownian motion) We have P\{ M_t \ge a \} = 2 P\{X_t \ge a\}\,. \tag{3.1} The pdf of M_t is given by f_{M_t}(a)= \sqrt{\frac{2}{\pi t}} e^{-\frac{a^2}{t}} \quad \textrm{ for } a >0 The pdf the first hitting time T_a is given by f_{T_a}(t)= \frac{a}{\sqrt{2 \pi}} t^{-3/2} e^{-\frac{a^2}{2t}} \quad \textrm{ for } t >0
Proof (Theorem 3.3). The proof of Equation 3.1 follows from the reflection principle. P\{ M_t \ge a \} = P\{ M_t \ge a , X_t\le a\} + P\{ M_t \ge a , X_t > a\} For the second term we note that if X_t >a then M_t\ge a and thus P\{ M_t \ge a , X_t > a\}=P(X_t>a).
For the first term we use the reflection principle: the reflected path Y_t is a Brownian motion and its first hitting time to a, say S_a is identical to T_a (see the picture). Therefore \begin{aligned} P\{ M_t \ge a , X_t\le a\} &= P\{ T_a\le t , X_t\le a\} = P\{ S_a \le t , Y_t\le a\} \\ &= P\{ T_a \le t , 2a - X_t\le a\} = P\{ T_a \le t , X_t \ge a\} =P\{ M_t \ge a , X_t \ge a\} \end{aligned} which is identical to the first term. This concludes the proof of Equation 3.1 and the density of M_t is obteined by differentiation.
To obtain the density of T_a we have P\{T_a \le t\} = P\{M_t > a\} =2 P\{ X_t > a\} = 2 P\left\{ \frac{X_t}{\sqrt{t}} > \frac{a}{\sqrt{t}} \right\} = 2 \int_{\frac{a}{\sqrt{t}}}^\infty \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}} dx and differentiating with respect to a proves the result.
The Brownian motion in dimension d is defined by X_t = ( X^{(1)}_t, \cdots, X^{(d)}_t) where the X^{(i)}_t are independent one-dimensional Brownian motions.
One deduce from this immediately that
One can conversely construct the Brownian motion from the properties 1. to 4.
The transition probabilities for Brownian motion are p_t(x,y) = \frac{1}{ (2 \pi t)^{d/2}} e^{-\frac{||y-x\|^2}{2t}} and we have the Chapman-Kolmogorov equation p_{t+s}(x,y) = \int_{\mathbf{R}^d} p_t(x,z) p_s(z,y) dz.
If \mu_0=\rho_o(x) dx is the initial distribution of Brownian particles X_o. Then at time t the distribution of X_t is given by \mu_t=\rho_t(x) dx where \rho_t(y) = \int_{\mathbf{R}^d} \rho_0(x) p_{t}(x,y) \, dy Conversely if we consider a function f: \mathbf{R}^d \to \mathbf{R} we have E[ f(X_t) |X_0=x] = \int_{\mathbf{R}^d} p_{t}(x,y) f(y) \, dy where we should pick f such that the expectation of the right is well defined. Note that since p_t(x,y)=p_t(y,x) is symmetric these two equation are actually identical.
We now derive a differential equation for the function f(t,x)\equiv E[ f(X_t) |X_0=x]. We could simply differentiate, under the integral, the semigroup with respect to t, but instead we provide a more interesting probabilistic proof.
Using a taylor expansion (in d=1 for simplicity) we obtain for functions f which are nice enough (for example C^2, bounded, and with bounded derivatives). f(X_{t+h}) - f(X_t) = f'(X_t) (X_{t+h} - X_t) + \frac{1}{2} f''(X_t) (X_{t+h} - X_t)^2 + o((X_{t+h} - X_t)^2). Since the increments are independent for example f'(X_t) and (X_{t+h} - X_t) are independent and so we obtain \begin{aligned} \frac{\partial f}{\partial t}(x,t) & = \lim_{h \to 0} E\left[ \frac{1}{h}(f(X_{t+h}) - f(X_t)) \right] \\ & = \lim_{h \to 0} \frac{1}{h} E[f'(X_t)] E[(X_{t+h} - X_t)] + \frac{1}{h} \frac{1}{2} E[f''(X_t)] E[(X_{t+h} - X_t)^2] + \frac{1}{h} o((X_{t+h} - X_t)^2)) \end{aligned} Since E[(X_{t+h} - X_t)]=0 and E[(X_{t+h} - X_t)^2]=\textrm{Var}(X_h)=h we obtain \frac{\partial f}{\partial t}(x,t) = \frac{\partial^2 f}{\partial x^2}(x,t) and by a similar argument, in dimension d we obtain the heat equation \frac{\partial f}{\partial t}(x,t) = \frac{1}{2}\Delta f (x,t) \quad \quad \Delta = \sum_{i=1}^d \frac{\partial}{\partial x^{(i)}} \frac{\partial}{\partial x^{(i)}}
Given a domain B with boundary \partial B let us consider the boundary/initial value problem \begin{aligned} & \frac{\partial f}{\partial t}(x,t) = \frac{1}{2}\Delta f (x,t), x \in B \\ & u(t,x)=g(x), x \in \partial B \\ & u(0,x)=f(x), x \in B \\ \end{aligned} We can express the solution in terms of Brownian motion and the hitting time to the boundary \tau= \inf \{t: X_t \in \partial B \} namely u(t,x) = E\left[ f(X_t) 1_{\{\tau >t\}} + g(X_{\tau}) 1_{\{\tau \le t\}}\right]\,. This is derived in a similar way as before. If we take t \to \infty and say B is bounded then the path will eventually hit \partial B and so taking t \to \infty we find the equation \begin{aligned} & \Delta u (x,t)=0, x \in B \\ & u(t,x)=g(x), x \in \partial B \\ \end{aligned} whose solution is E\left[ g(X_{\tau}) 1_{\{\tau \le t\}}\right].
By symmetry the solution is a function of the form u(x)=\phi(\|x\|) since Brownian motion is invariant under rotation: in polar coordinates the heat equation reads \Delta \phi(r)= \phi''(r) + \frac{d-1}{r} \phi'(r) =0 \,. This ODE is easy to solve and has the general solution \phi(r) = \left\{ \begin{array}{cl} c_1 \ln(r) + c_2 & \textrm{ if } d=2 \\ c_1 r^{d-2} + c_2 & \textrm{ if } d\ge 3 \end{array} \right. With the boundary values \phi(R_1)=0 and \phi(R_2)=0 we find \phi(r) = \left\{ \begin{array}{cl} \frac{\ln(r)- \ln(R_1)}{\ln(R_2) - \ln(R_1)} & \textrm{ if } d=2 \\ \frac{ R_1^{2-d} - \|x\|^{2-d}}{R_1^{2-d} - R_2^{2-d}} & \textrm{ if } d\ge 3 \end{array} \right. To study the recurrence we take R_1=\epsilon and R_2 \to \infty to find the probability that a Brownian path hits a small ball around 0 starting from x.
For d=2 since
\lim_{R_2 \to \infty} P\left\{ \|X_t\|=R_2 \textrm { before } \|X\|_t=\epsilon|X_0=x\right\} = \lim_{R_2 \to \infty} \frac{\ln(r)- \ln(\epsilon)}{\ln(R_2) - \ln(\epsilon)} = 0
That is X_t returns to the disc \|x\| \le \epsilon with probability 1 for arbitrary \epsilon and it is natural to call X_t recurrent.
Note however that X_t never returns exactly to 0. Indeed we have
\lim_{\epsilon \to 0} P\left\{ \|X_t\|=\epsilon \textrm { before } \|X\|_t=R_2| X_0=x \right\} = \lim_{\epsilon \to 0}\left( 1- \frac{\ln(r)- \ln(\epsilon)}{\ln(R_2) - \ln(\epsilon)}\right) = 0
For d\ge 3, arguing similarly we find that
\lim_{R_2 \to \infty} P\{ \|X_t\|=R_2 \textrm { before } \|X\|_t=\epsilon\} = \lim_{R_2 \to \infty} \frac{ \epsilon^{2-d} - \|x\|^{2-d}}{\epsilon^{2-d} - R_2^{2-d}} = 1 - \left(\frac{\epsilon}{\|x\|}\right)^{d-2} < 1
and so the Brownian motion does not return near zero with non-zero probbaility: X_t is transient in dimension 3 or more.
For a vector \mu \in \mathbb{R}^d let us consider the Brownian motion with drift \mu is given by Y_t = X_t + t\mu It is not too difficult to see that the following properties hold
Y_t has independent increments and Y_t-Y_s (s\le t) has a normal distribution with mean \mu(t-s) and covariance \sigma^2(t-s)I.
The transition probabilities are p_t(x,y)= \frac{1}{(2\pi \sigma^2 t)^{d/2}} e^{- \frac{\|y-x-t\mu\|^2}{2\sigma^2 t}}.
The function f(t,x)=E\left[ f(Y_t)\right] satisfies the partial differential equation \frac{\partial f}{\partial t} = \frac{1}{2}\Delta f + \mu \nabla f