«

»

Feb 15 2010

Derive all the laws of mechanics in one blog post

Introduction

The most basic law of nature that I know of is about probabilities. If you flip a coin, we say that the probability of getting heads is 1/2. The probability of getting tails is the same. The probability of getting either heads or tails is the sum

P(\text{heads or tails}) = P(\text{heads}) + P(\text{tails}) = 1.
(1)

Something has to happen, so the probability is 1 of something happening. The probability of getting two heads in a row is the product:

	P(\text{heads then heads}) = P(\text{heads}) \times P(\text{heads}) = \frac{1}{2} \times \frac{1}{2} = \frac{1}{4}.
(2)

That is, an "and" proposition means to multiply the probabilities. Most grade school children can get their heads around this. It's an apparent mathematical fact, a point of logic,
and one might wonder why it would even end up being the subject of a discussion of physics.

Fine, so suppose that the things we are talking about are not flips of a coin, but some physical events, such as "the electron goes through hole A and hits a spot x on the wall behind the hole" and "the electron goes through hole B and hits a spot x on the wall behind the hole". We could call the probabilities of these things happening P(\text{A}) and P(\text{B}) . Now, we can tell very easily what P(\text{A}) is. Cover up hole B, and count how many electrons hit at x . That number divided by the total number of electrons you sent out is P(\text{A}) . Then do the same thing for P(\text{B}) : cover up A, count the number at x, divide.

Now open up both holes, and count the number that hit x while both of them are open. The probability of going through either A or B should be P(\text{A})+P(\text{B}) . But it isn't! If you count how many electrons hit x with both holes open, and divide that by the number you sent out, you do not get the same thing as the sum of the two numbers. What a pity.

It turns out that the solution to this problem is that the probabilty isn't a base thing; it's actually the square of a more fundamental thing. For want of a better term, this thing is called the amplitude. So there's an amplitude, I'll call it \phi_A, of going through hole A, and P(\text{A}) = |\phi_A|^2 . And also, of course, P(\text{B}) is the square of another amplitude \phi_B. And, the law of nature is that it's the amplitudes that should be added in an "or" situation:

	\phi_\text{A or B} = \phi_\text{A} + \phi_\text{B}.
(3)

To get the probability P(\text{A or B}) we have to square this thing, which is

	P(\text{A or B}) = |\phi_\text{A} + \phi_\text{B}|^2 = |\phi_\text{A}|^2 + |\phi_\text{B}|^2 + 2 |\phi_\text{A} \phi_\text{B}|.
(4)

What makes this particularly weird is that it appears to have a term (\phi_\text{A} \phi_\text{B}) that looks like the probability of going through both A and B. It doesn't pay to take the interpretation too far, but are we willing to accept that the electron goes through both holes simultaneously?

In any case, this is how the universe works, so we're stuck with it. The amplitudes \phi are complex numbers, not real numbers. When taking the square |\phi|^2 you have to take the absolute square, which is \phi times its complex conjugate \phi^* . But when you do this, you get the honest-to-god probability.

But you might be able to see a problem right away. Suppose that a particle starts at point a, and we want to calculate the probability that it will be at b some time later. How many ways can it possibly go? Infinite ways!

Sum over all paths

The problem is illustrated below.

Multiple paths from a to b

If we are going to solve this problem, we are going to need to know the amplitude of all of the possible paths. Then, since it can go along path 1 OR path 2 OR path 3, etc., we have to add these amplitudes. And there are a lot of possible paths, just a few of which are illustrated (the curves cannot backtrack in time, but they can wiggle all they want in space). We call K(b,a) the amplitude of being at point b, having started at point a,

K(b,a) = \phi_1 + \phi_2 + \phi_3 + \cdots.
(5)

The points a and b have coordinates (x_a, t_a) and (x_b, t_b) . ((Modern texts now usually refer to this as G(x_2,t_2;x_1,t_1), since it turns out that K is a Green's function.))

Setting aside whether we can actually even do this sum, we still have to enunciate what \phi is for a given path. Herein lies the actual fundamental law of all mechanics:

	\phi_\text{path} = \exp\left(\frac{i}{\hbar} \times \text{the action of the path}\right).
(6)

Action is a path integral of a function L, the lagrangian, written as

	S(b,a) = \int_{t_a}^{t_b} L(\dot{x},x,t) \, dt.
(7)

One might be tempted, at this point, to cry out "This is hopeless! Just to find the probability of something happening, we have to take the sum of a thing, over an infinite number of paths, where the thing is the exponential of a path integral!." Well, nature is not such a cruel bitch, and it turns out that we can work with this somewhat, even though finding the action itself is very very hard, and the sum over such things even harder.

If S is big for some paths

In real-world scenarios, oftentimes the action is the sum over a lagrangian where the numbers are quite large compared with \hbar . If a complex number is written as \exp(i \theta), for phase angle \theta, then we can visualize the amplitudes as arrows in the 2D plane. \theta tells us the direction of the arrow. If two amplitudes have, for instance, phase angles 43° and 223°, then they point in exactly opposite directions, and their sum is zero. Since the amplitudes are \exp(i S/\hbar), any time S is much larger than \hbar, the phase of the amplitude is very different for different paths, even if the paths are very similar, just because of their huge size. We imagine that the arrow is turning around extremely rapidly in the plane as we go through all of the paths that we have to account for. And on average, we sample all the angles equally, and the total sum is zero. This is true except for an extremum of S, where nearby paths are pretty much the same (identical to first order).

Thus it is that for classical mechanics, we only need to find the extremum (minimum, in this case) of the action. This is actually pretty easy. Normally to find the extremum of a function you have to differentiate that function. But S is not a function. It's a functional of L, which means that it depends on the value of L at every point. This is in contrast to a function of a function, like g(f(x)), where g would only depend on f at a particular x . The analogous operation is to find the variation \delta S, and set that equal to zero.

If we assume that the lagrangian does not explicitly depend on time, then we can slip the variation under the integral symbol:

	\delta S = \delta \int L(\dot{x},x) \, dt = \int \delta L(\dot{x},x) \, dt,
(8)

and we can write this out by finding the full "differential" of L:

	\delta S = \int \left(\frac{\partial L}{\partial \dot{x}} \, \delta \dot{x} + \frac{\partial L}{\partial x} \, \delta x \right) \, dt.
(9)

Integrating the first term by parts gives

	\int \frac{\partial L}{\partial \dot{x}} \, \delta \dot{x} \, dt = \left[\frac{\partial L}{\partial \dot{x}} \, \delta x\right]_a^b - \int \frac{d}{dt} \frac{\partial L}{\partial \dot{x}} \, \delta x.
(10)

The term in brackets is 0, since we are varying the path with fixed endpoints a and b . Therefore

\delta S = \int \left(- \frac{d}{dt} \frac{\partial L}{\partial \dot{x}} \,  + \frac{\partial L}{\partial x} \right) \delta x \, dt = 0.
(11)

For arbitrary \delta x, the term in parenthesis must be zero. The lagrangian is kinetic energy minus potential energy

	L = \frac{m \dot{x}^2}{2} - V(x),
(12)

so

	\frac{d}{dt} \frac{\partial L}{\partial \dot{x}} \,  - \frac{\partial L}{\partial x} = \frac{d}{dt}(m \dot{x}) - \frac{\partial V}{\partial x} = 0.
(13)

Since any consevative (curl-less) force can be written as the negative derivative of a potential

\boxed{F = \frac{dp}{dt} = m a.}
(14)

If S is not big

The rule for "or" is to add amplitudes and for "and" we multiply amplitudes:

\phi_\text{A and then B} = \phi_\text{A} \phi_\text{B}.
(15)

We are still trying to find out the amplitude of going from a to b, which I said we call K(b,a). If we wanted to, we could specify a third point, intermediate to these two points, called c. Then, the amplitude is the amplitude of going from a to c, then from c to b, only we have to sum over all allowed points c:

	K(b,a) = \int_{x_c} K(c,a) K(b,c) \, dx_c.
(16)

I multiplied before integrating because it's the amplitude of going from a to c and then going from c to b.

If we wanted to, we could make N such intervals, so that we insert N-1 points in between them, numbered from 1 to N-1 . Then,

	\begin{split}K(b,a) & = \int_{x_1} \int_{x_2} \cdots \int_{x_{N-1}} K(b,N-1) K(N-1,N-2)\cdots \\ & \times  K(i+1,i) \cdots K(1,a) \, dx_1 \, dx_2 \cdots \, dx_{N-1}. \end{split}
(17)

Now we've split the path up into tiny sections, over a small time interval which we call \epsilon. Then, as we said before the amplitude is

K(i+1,i) = \frac{1}{A} \exp\left(\frac{i}{\hbar}\int_{t_i}^{t_{i+1}} L(\dot{x},x,t) \, dt \right).
(18)

(I have added in a factor of 1/A so that the probability will be normalized to 1. We will have to find out what A is if we want our probabilities to work out right.) But now we have something that's over so small a range that we can approximate S to first order without any issues. We replace \dot{x} with (x_{i+1} - x_i)/\epsilon and x with (x_{i+1}+x_i)/2 . The integral we will say is just \int L \, dt = \epsilon L. Thus,

K \cong \frac{1}{A} \exp\left[\frac{i \epsilon}{\hbar} L\left(\frac{x_{i+1} - x_i}{\epsilon}, \frac{x_{i+1} + x_i}{2}\right)\right]
(19)

Again, we say that the lagrangian is just kinetic minus potential energy (Eqn. 12 above), so if we call x=x_{i+1} and y = x_i, we get

K = \frac{1}{A} \exp\left[\frac{i \epsilon}{\hbar} \frac{m}{2}\left(\frac{x - y}{\epsilon}\right)^2 - \frac{i \epsilon}{\hbar} V\left(\frac{x + y}{2}\right)\right].
(20)

This is the amplitude for being at point (x_{i+1},t_{i+1}) if you started at (x_i,t_i), which was \epsilon earlier. What we want is the amplitude to be at (x,t + \epsilon) having started from anywhere. This is not bad, because we just have to multiply the amplitude of going between the points (our K) by the amplitude to be at (y,t), and integrate over all possible y:

\psi(x,t + \epsilon) = \int K \psi(y,t) \, dy,
(21)

where \psi is now the sign for the amplitude. Putting in for K,

	\psi(x,t + \epsilon) = \int \frac{1}{A} \exp\left[\frac{i \epsilon}{\hbar} \frac{m}{2}\left(\frac{x - y}{\epsilon}\right)^2 - \frac{i \epsilon}{\hbar} V\left(\frac{x + y}{2}\right)\right] \psi(y,t) \, dy.
(22)

To make things a little simpler, let's call the spatial difference x-y = \eta, so that y = x + \eta and dy = d\eta . Then,

	\psi(x,t + \epsilon) = \int \frac{1}{A} \exp\left[\frac{i m}{2 \hbar \epsilon} \eta^2\right]\exp\left[ - \frac{i \epsilon}{\hbar} V\left(x + \frac{\eta}{2}\right)\right] \psi(x + \eta,t) \, d\eta.
(23)

Now that we have everything in terms of small changes, we can do an expansion. We do the expansion to first order in \epsilon and second order in \eta. Dear reader is no doubt aware of the Taylor expansion of a function near a point a

	g(\zeta) = \sum_{n=0}^\infty \frac{(\zeta-a)^n}{n!} g^{(n)}(a).
(24)

We can convert this into a series in the difference between the point \zeta and a small change h by letting z=\zeta+h and a = z (this is ok if the substitution is done after differentiation)

	g(z+h) = \sum_{n=0}^\infty \frac{h^n}{n!} g^{(n)}(z).
(25)

This shows that

\psi(x,t + \epsilon) = \psi(x,t) + \epsilon \frac{\partial \psi(x,t)}{\partial t} + \mathcal{O}(\epsilon^2),
(26)
\psi(x + \eta,t) = \psi(x,t) + \eta \frac{\partial \psi(x,t)}{\partial x} + \frac{\eta^2}{2} \frac{\partial^2 \psi(x,t)}{\partial x^2} + \mathcal{O}(\eta^3),
(27)

and,

\begin{split}& \exp\left[-\frac{i \epsilon}{\hbar}V\left(x + \frac{\eta}{2}\right)\right] = 1 - \frac{i \epsilon}{\hbar} V\left(x + \frac{\eta}{2}\right) + \mathcal{O}(\epsilon^2) \\ & = 1 - \frac{i \epsilon}{\hbar} V(x) + \mathcal{O}(\eta \epsilon + \epsilon^2). \end{split}
(28)

The leading term on the RHS of (23) is

	\psi \int \frac{1}{A} \exp\left(\frac{i m}{2 \hbar \epsilon} \eta^2\right) d \eta= \psi \frac{1}{A} \sqrt{\frac{2 \pi i \hbar \epsilon}{m}}
(29)

which is found by the simple gaussian integral formula

	\int_{-\infty}^\infty e^{-a x^2} = \sqrt{\frac{\pi}{a}}.
(30)

Compare this with the leading term on the LHS, which is just \psi . This means that, if the expression is to be true to zeroth order,

	\frac{1}{A} = \sqrt{\frac{m}{2 \pi i \hbar \epsilon}}.
(31)

The second term on the RHS is

	\frac{\partial \psi}{\partial x}\int \frac{1}{A} \, \eta \exp\left(\frac{i m \eta^2}{2 \hbar \epsilon}\right) d\eta = 0,
(32)

since the integrand is an odd function in \eta . The third term on the RHS is

	\frac{1}{2} \frac{\partial^2 \psi}{\partial x^2} \int \frac{1}{A} \, \eta^2 \exp\left(\frac{i m \eta^2}{2 \hbar \epsilon}\right) d\eta = \frac{1}{2 A} \frac{\partial^2 \psi}{\partial x^2} \frac{1}{2}\left(\frac{2 \pi \hbar i \epsilon}{m}\right)^{3/2}
(33)

(found by taking -\partial/\partial a of both sides of Eqn 30). Once we substitute in for our value of A, this becomes

	\frac{2 i \hbar \epsilon}{m} \frac{\partial^2 \psi}{\partial x^2}.
(34)

Using all of these parts, we have the equation

	\psi + \epsilon \frac{\partial \psi}{\partial t} = \left(1 - \frac{i \epsilon}{\hbar} V \right)\left(\psi + \frac{i \hbar \epsilon}{2m} \frac{\partial^2 \psi}{\partial x^2}\right),
(35)

which has a factor of \epsilon^2 that we ignore to get

	\epsilon \frac{\partial \psi}{\partial t} = - \frac{i \epsilon}{\hbar} V \psi + \frac{i \hbar \epsilon}{2m} \frac{\partial^2 \psi}{\partial x^2}.
(36)

Finally, multiply through by (- \hbar/i \epsilon) to find

	\boxed{i \hbar \frac{\partial \psi}{\partial t} = -\frac{\hbar^2}{2m} \frac{\partial^2 \psi}{\partial x^2} + V \psi.}
(37)