LOG#009. Relativity of simultaneity.

Other striking consequence of Lorentz transformations and then, of the special theory of relativity arises when explore the concept of simultaneity. Accordingly to the postulates of relativity, and the structure of Lorentz transformations we can understand the following statement:

\boxed{\mbox{Simultaneity is a relative concept. It depends on the inertial reference frame.}}

What does it means? Not surprisingly, if two arbitrary events happening in space and time, E_1, E_2, take simultaneously in one inertial reference frame, they do NOT do so in any other inertial reference frame. Indeed, Einstein himself guessed an alternative definition of simultaneity:

“(…) Two events E_1, E_2 taking place at two different locations are said to be simultaneous if two spherical light waves, emitted with the events, meet each other at the center of the tie line connecting the locations of the events(…)”

A proof can be done using Lorentz transformations as follows. In certain frame S’, two events E'_1 and E'_2 are found to be simultaneous, i.e., they verify that t'_1=t'_2. Therefore, using the Lorentz transformations (in the case of parallel motion of S’ with respect to S without loss of generality) we get

ct_1=\gamma (ct'_1+\beta x'_1)

ct_2=\gamma (ct'_2+\beta x'_2)

and thus, since t'_2=t'_1, the substractiong produces

c(t_2-t_1)=\gamma \beta (x'_2-x'_1)

We can recast this result as:

\boxed{\Delta t=t_2-t_1=\dfrac{\beta}{c}\gamma (x'_2-x'_1)=\dfrac{1}{\sqrt{1-\dfrac{v^2}{c^2}}}\dfrac{v}{c^2}(x'_2-x'_1)}

We can also derive this equation with a LIGHT CLOCK gedanken experiment. A light clock of proper length L'=l' is at rest in the S’ frame. Its x’-axis moves at speed v parallel to the x-axis in teh S-frame. We attach some mirrors (denoted by M) to the short ends of the clock, and light travels parallel to the direction of motion. See the next figure of this device:

SimultaneityLightClock

At the initial time, we synchronize the clocks in S and S’, meaning that t=t'=0 when x=x'=0 and the light flash is emitted. In the S’-frame, light propagates in both directions at speed of light, and then, it reaches the mirrors at the ends at the same time

t'=\dfrac{s'}{c}=\dfrac{L'}{2c}

In the S-frame, by the other hand, the length of the light clock is contracted due to length contraction L=\sqrt{1-\beta^2}L' but light still propagates at speed c. However, the left-hand mirror is moving toward the light at speed v. In the time interval t_1 required for light to reach the left mirror, light travels a distance

ct_1=\dfrac{L}{2}-vt_1

In the same way, the right-hand mirror is running away from the light flash. In a certain time t_2 required for light to reach it, light should travel a total length

ct_2=\dfrac{L}{2}+vt_2

Substracting both equations, we obtain

t_2-t_1=\dfrac{L}{2(c-v)} - \dfrac{L}{2(c+v)}=\dfrac{L}{2}\dfrac{2v}{c^2-v^2}=L \dfrac{v}{c^2} \dfrac{1}{1-\dfrac{v^2}{c^2}}

Using the contraction lenght result, i.e., using that

L=\dfrac{L'}{\gamma}=\sqrt{1-\dfrac{v^2}{c^2}}L'

and that

L'=x'_2-x'_1

in the S’-frame, we get the previous result

\Delta t= t_2-t_1=\dfrac{1}{ \sqrt{1-\dfrac{v^2}{c^2}}}\dfrac{v}{c^2}(x'_2-x'_1)=\dfrac{\beta}{c}\gamma (x'_2-x'_1)

Q.E.D.

Therefore, the meaning of this boxed formulae is straighforward:

If two events are simultaneous in the S’-frame are simultaneous in the S-frame in the case they do happen at the same position (x'_2=x'_1) and/or the relative velocity v between the two frames is zero (v=0).

Moreover, we have this interesting additional result:

The larger the spatial separation is between two simultaneous events in the S’-frame, and/or the higher the relative velocity is between the S and the S’ frame, the greater is the temporal separation of the events in the S-frame.

LOG#008. Length contraction.

Once we introduce the postulates of special relativity and we have deduced the generalization of galilean transformations for electromagnetism and mechanics, the Lorentz transformation. We can deduce some interesting results.

Suppose we have two events E_1 and E_2, whose coordinates of space and time are given generally by E_1(x_1,y_1,z_1,t_1) and E_2(x_2,y_2,z_2,t_2). We also suppose, for simplicity, that the relative motion is along the x-axis. Imagine a rod, whose ”rest” length is

L_0= x_2- x_1

It is evident from the structure of Lorentz transformations that time depends on the observer frame (S or S’) and we have to fix the notion of “simultaneity” to measure the rod length in a meaningful way. Therefore, we can set t'_1=t'_2 to “syncronize” our stick measurements in “motion”, i.e., we have to measure the position of the rod in the same time in order to determine its length at motion!

Using the inverse Lorentz transformations:

x_1=\gamma (x'_1+vt'_1) and x_2=\gamma (x'_2+vt'_2)

Therefore, substracting both equations, we get, using the temporal condition (simultaneity) as well:

x_2-x_1=L_0=\gamma (x'_2-x'_1)

or equivalenty

\dfrac{x_2-x_1}{\gamma} = (x'_2-x'_1)

i.e.

\boxed{L'=\sqrt{1-\beta^2}L_0}

This result is known in Special Relativity (SR) as length contraction. Bodies in motion have “dimensions” that are shorter than those “in rest”. Of course, according to the postulates of relativity, it is relative. From the S’ frame, objects that were “in rest” in S appear to be shorter as well. In that case, from the viewpoint of S’ L'_0=x'_2-x'_1 and L_0=x_2-x_1 if we set t_1=t_2 for our “stick”. In this case we get:

x'_1=\gamma (x_1-vt_1) and x'_2=\gamma (x_2-vt_2)

and then

L'_0=x'_2-x'_1=\gamma ( x_2-x_1)

so, according to S’, the length of the rod in the S frame is

L_0=L'_0\sqrt{1-\beta^2}

i.e., again, the body in motion is “shorter” than the same body “in rest”. The only careful point is to realize if the proper “length” is known or the contracted length, and then use the suitable expression to obtain the contracted length or the proper length, respectively.

There is an alternative proof of this result using what Einstein himself called LIGHT CLOCK. The light clock is a nice Gendankenexperiment using frames and light signals between the S and S’ frames. S is at rest relative to S’. S’ is in motion relative to S. At t’=t=0 a light signal ( a “flash”) is emitted in S’, where there is an object with lenght equal to l’ (since we are in the S’ frame).

lightclockSprime

By the other hand, in the S frame we have the following events:

SframeLightclock

Now we will see the physical picture behind the two frames.

A) S’ frame. The light flash travel till the end ob the object, where is put a mirror and it comes back. The path traveled by the light is equal to 2l’ during a time t’, and thus, the length observed by the S’ frame is equal to:

l'=\dfrac{1}{2}ct'

B) S frame. The light flash from t=0, but the object is in motion too, so it travels an additional quantity t=t_1 until the ray is reflected, and another extra quantity vt_2 till it arrives to our origin when we measure the light time of arrival. There are two differents contributions to the total length l:

l_1=l+vt_1

l_2=l-vt_2

But, of course, l_1=ct_1 and l_2=ct_2, and then

(c-v)t_1=l and (c+v)t_2=l

The total running time of the light path in the S-frame will be:

t=t_1+t_2=\dfrac{l}{c-v}+\dfrac{l}{c+v}=\dfrac{l(c+v)+l(c-v)}{c^2-v^2}=\dfrac{2lc}{c^2-v^2}=\dfrac{2l}{c}\dfrac{1}{1-\frac{v^2}{c^2}}

or

t=\gamma^2\dfrac{2l}{c}

Therefore, the light clock in the S-frame measures

l=\dfrac{1}{\gamma^2}\dfrac{1}{2}ct

Dividing the results from A) and B), we get

\dfrac{l'}{\gamma l}=\dfrac{\gamma t'}{t}

But as we know that the time contraction implies \gamma t'=t so we get at last

l=\dfrac{l'}{\gamma}

The main conclusion is the following:

\boxed{\mbox{Length is relative. Lengths measured in the direction of motion are shorter.}}

This phenomenon is known as LENGTH CONTRACTION in special relativity.

Of the whole set of inertial observers moving along a certain direction, an observer at rest relative to an object extended in that direction measures the greatest length for that object. This length is commonly called PROPER LENGTH of the object. Lengths in TRANSVERSE( or orthogonal) directions of motion are not subject to length contractions.

LOG#007. Time dilation.

Suppose two events happening in the S’-frame at the same point at different times. E'_1(x',y',z',t'_1) and E'_2(x',y',z',t'_2). What is the temporal separation in the S-frame? According to Lorentz transformations, it is:

c(t_2-t_1)=\gamma \left[ c(t'_2-t'_1)+\beta(x'-x')\right]

or equivalently

t_2-t_1=\Delta t=\gamma (t'_2-t'_1)=\gamma \Delta t'

i.e.

\boxed{\Delta t = \gamma \Delta t'}

Of course, in the parallel case, if two events are happening in the S-frame at the same point but different times, i.e., if E_1(x,y,z,t_1) and E_2(x,y,z,t_2), their temporal separation in the S’-frame will be, using the inverse Lorentz transformations will be:

c(t'_2-t'_1)=\gamma \left[ c(t_2-t_1)-\beta(x-x)\right]

Thus,

t'_2-t'_1=\Delta t' = \gamma (t_2-t_1) =\Delta t

i.e.

\boxed{\Delta t' = \gamma \Delta t}

The boxed equations are called the TIME DILATION.

We can use an alternative procedure to derive this deep result in relativity (a result with important phenomenology!). We can use a LIGHT CLOCK device. In the S’-frame, we send light rays to a mirror as a “clock”. In the S-frame, in motion relative to S’ with velocity parallel to it we have the following scheme for the “light-clock”:

clock2

Thus, we have the above Pytaghorean relationship. The tic-tac in the S-frame is \Delta t=t/2. The tic-tac in S’ is equal to \Delta t'= t. Moreover, accordingly to the S’-frame, a light flash will travel a distance 2d (forward and back, reflected on a mirror). Therefore:

2d=ct'

Therefore, by the Pythagorean theorem in the triangle with the sides given in the above picture, we have

\dfrac{v^2t^2}{4}+c^2\dfrac{t'^2}{4}=\dfrac{c^2t^2}{4}

and then

c^2\dfrac{t'^2}{4}=\dfrac{c^2t^2-v^2t^2}{4}

so

t'^2=\dfrac{t^2}{\gamma^2 }

i.e., since we can choose initial tic-tacs in S and S’ equal to zero, we obtain t'=\Delta t' and \Delta t= t, and we also get

\Delta t = \gamma \Delta t' as before!

CONCLUSION:

\mbox{Time is relative. Time measurements of clocks (tic-tacs) in motion are longer.}

Indeed, of the whole set of inertial observers, an observer at rest relative to a process measures the shortest possible time for that process. It is called the PROPER TIME, and it is generally denoted by the greek letter \tau.

CAUTION: It seems that the above two formula for the time dilation are contradictory, but it is wrong. They are OK. We must keep in mind that the time variables on the right-hand side of those equations mean “proper time” of a concrete process, i.e., it stands for the time that an observer measures being at rest relative to that process. The left-hand side of both equations denotes the time an observer measures being “in motion” relative to that process. When S and S’ are in motion, there can only be ONE frame where the process is taken as “being in rest”. We have to select one of the equations, we can NOT choose both!

LOG#006. Lorentz Transformations(II).

\boxed{ \begin{cases} x'_0=ct'=\gamma (ct - \mathbf{\beta} \cdot \mathbf{r}) \\ \mathbf{r'}=\mathbf{r}+(\gamma -1) \dfrac{(\mathbf{\beta}\cdot \mathbf{r})\mathbf{\beta}}{\beta^2} -\gamma \beta ct \\ \gamma = \dfrac{1}{\sqrt{1-\beta^2}}= \dfrac{1}{\sqrt{1-\beta_x^2-\beta_y^2-\beta_z^2}} \end{cases}}

\boxed{\left( \begin{array}{c} ct' \\ x' \\ y' \\ z' \end{array} \right) = \begin{pmatrix} \gamma & -\gamma \beta_x & -\gamma \beta_y & -\gamma \beta_z \\ -\gamma \beta_x & 1+(\gamma -1)\dfrac{\beta_{x}^{2}}{\beta^2} & (\gamma -1)\dfrac{\beta_x \beta_y}{\beta^2} & (\gamma -1)\dfrac{\beta_x \beta_z}{\beta^2} \\ -\gamma \beta_y & (\gamma -1)\dfrac{\beta_y \beta_x}{\beta^2} & 1+(\gamma -1)\dfrac{\beta_{y}^{2}}{\beta^2} & (\gamma -1)\dfrac{\beta_y \beta_z}{\beta^2} \\ -\gamma \beta_z & (\gamma -1)\dfrac{\beta_z \beta_x}{\beta^2} & (\gamma -1)\dfrac{\beta_z \beta_y}{\beta^2} & 1+(\gamma -1)\dfrac{\beta_{z}^{2}}{\beta^2} \end{pmatrix} \left( \begin{array}{c} ct\\ x\\ y\\ z\end{array}\right)}

These equations define the most general (direct) Lorentz transformations  and we see they are not those in the previous post! I mean, they are not the one with the relative velocity in the direction of one particular axis, as we derived in the previous log. We will derive these equations. How can we derive them?

The most general Lorentz transformation involves the following scenario (a full D=3+1 motion):

1st) The space-time coordinates of an event E are described by one observer (and frame) A at rest at the origin of his own frame S. The observer B is at rest at the origin in a second frame S’. S and S’ have parallel axes.

2nd) The origin of the S and S’ frames coincide at t=t’=0.

3rd) B moves relative to A with a velocity  3d vector (space-like) given by \mathbf{v}=(v_x,v_y,v_z).

4th) The position vector of the event in the S-frame is \mathbf{r}=(x,y,z). It is decomposed into  “horizontal/vertical” or parallel/orthogonal pieces as follows

\mathbf{r}=\mathbf{r}_\parallel + \mathbf{r}_ \perp

The following transformation is suitable for the S’-frame, defining \beta=\mathbf{v}/c=(v_x/c,v_y/c,v_z/c):

ct'=x_0=\gamma(ct-\beta r_\parallel)=\gamma (ct - \mathbf{\beta} \cdot \mathbf{r})

\mathbf{r'}_\parallel = \gamma (\mathbf{r}_\parallel - \mathbf{\beta}ct )

\mathbf{r'}_\perp=\mathbf{r}_\perp

where the dot represents scalar product. Using the elementary knowledge and application of the scalar product with projections of vectors, we calculate the projection of the position vector onto the velocity vector in any frame with the scalar product of the position vector with a normalized velocity vector, \mathbf{\hat{v}}=\mathbf{v}/v . In the S’-frame we will get the projection \mathbf{\hat{v}}\cdot \mathbf{r}. Therefore,

\mathbf{r}_\parallel = (\hat{\mathbf{v}}\cdot \mathbf{r})\hat{\mathbf{v}}

and thus, the component of the position vector with respect to the parallel direction to the velocity will be:

\mathbf{r}_\parallel = (\hat{\mathbf{v}}\cdot \mathbf{r})\hat{\mathbf{v}}= \dfrac{(\mathbf{v}\cdot \mathbf{r})\mathbf{v}}{v^2}

or

\mathbf{r}_\parallel = \dfrac{(\mathbf{\beta}\cdot \mathbf{r})\mathbf{\beta}}{\beta^2}

Then, since \mathbf{r}_\perp = \mathbf{r}-\mathbf{r}_\parallel, we have

\mathbf{r}_\perp = \mathbf{r}- \dfrac{(\mathbf{\beta}\cdot \mathbf{r})\mathbf{\beta}}{\beta^2}

Finally, we put together the vertical/horizontal (orthogonal/parallel) pieces of the general Lorentz transformations:

\mathbf{r'} =\mathbf{r'}_\parallel + \mathbf{r'}_ \perp = \gamma \left( \dfrac{(\mathbf{\beta}\cdot \mathbf{r})\mathbf{\beta}}{\beta^2} -\beta ct \right)+\mathbf{r}- \dfrac{(\mathbf{\beta}\cdot \mathbf{r})\mathbf{\beta}}{\beta^2}

Then, the general 4D=3d+1 Lorentz transformation (GLT) from S to S’ are defined through the equations:

\boxed{GLT(S\rightarrow S') \begin{cases} x'_0=ct'=\gamma (ct - \mathbf{\beta} \cdot \mathbf{r}) \\ \mathbf{r'}=\mathbf{r}+(\gamma -1) \dfrac{(\mathbf{\beta}\cdot \mathbf{r})\mathbf{\beta}}{\beta^2} -\gamma \beta ct \\ \gamma = \dfrac{1}{\sqrt{1-\beta^2}}= \dfrac{1}{\sqrt{1-\beta_x^2-\beta_y^2-\beta_z^2}} \end{cases}}

Q.E.D.

The inverse GLT (IGLT) will be:

\boxed{IGLT(S'\rightarrow S) \begin{cases} x_0=ct=\gamma (ct' + \mathbf{\beta} \cdot \mathbf{r}') \\ \mathbf{r}=\mathbf{r}'+(\gamma -1) \dfrac{(\mathbf{\beta}\cdot \mathbf{r}')\mathbf{\beta}}{\beta^2} +\gamma \beta ct' \\ \gamma = \dfrac{1}{\sqrt{1-\beta^2}}= \dfrac{1}{\sqrt{1-\beta_x^2-\beta_y^2-\beta_z^2}} \end{cases}}

Indeed, these transformations allow a trivial generalization to D=d+1, i.e., these transformations are generalized to d-spatial dimensions simply allowing a d-space velocity and beta parameter, while time remains 1d. Indeed, the Lorentz transformations form a group. A group is a mathematical gadget with certain “nice features” that physicists and mathematicians love. You can imagine the Lorentz group in D=d+1 dimensions as a generalization of the rotation group  called Lorentz group. The Lorentz group involves rotations around the spatial axes plus the so-called “boosts”, transformations involving mixing of space and time coordinates. Indeed, the Lorentz transformations involving relative motion along one particular axis IS a (Lorentz) boost! That is, the simplest Lorentz transformations like the one in the previous posts are “boosts”.

With the above transformations, the GLT can be easily written in components:

\boxed{\left( \begin{array}{c} ct' \\ x' \\ y' \\ z' \end{array} \right) = \begin{pmatrix} \gamma & -\gamma \beta_x & -\gamma \beta_y & -\gamma \beta_z \\ -\gamma \beta_x & 1+(\gamma -1)\dfrac{\beta_{x}^{2}}{\beta^2} & (\gamma -1)\dfrac{\beta_x \beta_y}{\beta^2} & (\gamma -1)\dfrac{\beta_x \beta_z}{\beta^2} \\ -\gamma \beta_y & (\gamma -1)\dfrac{\beta_y \beta_x}{\beta^2} & 1+(\gamma -1)\dfrac{\beta_{y}^{2}}{\beta^2} & (\gamma -1)\dfrac{\beta_y \beta_z}{\beta^2} \\ -\gamma \beta_z & (\gamma -1)\dfrac{\beta_z \beta_x}{\beta^2} & (\gamma -1)\dfrac{\beta_z \beta_y}{\beta^2} & 1+(\gamma -1)\dfrac{\beta_{z}^{2}}{\beta^2} \end{pmatrix} \left( \begin{array}{c} ct\\ x\\ y\\ z\end{array}\right)}

Q.E.D.

These transformations can be written in a symbolic way using matrix notation as \mathbb{X}'=\mathbb{L}\mathbb{X} or using tensor calculus:

x^{\mu'}=\Lambda^{\mu'}_{\;\nu} x^\nu

The inverse GLT (IGLT) will be in component way:

\boxed{\left( \begin{array}{c} ct \\ x \\ y \\ z \end{array} \right) = \begin{pmatrix} \gamma & \gamma \beta_x & \gamma \beta_y & \gamma \beta_z \\ \gamma \beta_x & 1+(\gamma -1)\dfrac{\beta_{x}^{2}}{\beta^2} & (\gamma -1)\dfrac{\beta_x \beta_y}{\beta^2} & (\gamma -1)\dfrac{\beta_x \beta_z}{\beta^2} \\ \gamma \beta_y & (\gamma -1)\dfrac{\beta_y \beta_x}{\beta^2} & 1+(\gamma -1)\dfrac{\beta_{y}^{2}}{\beta^2} & (\gamma -1)\dfrac{\beta_y \beta_z}{\beta^2} \\ \gamma \beta_z & (\gamma -1)\dfrac{\beta_z \beta_x}{\beta^2} & (\gamma -1)\dfrac{\beta_z \beta_y}{\beta^2} & 1+(\gamma -1)\dfrac{\beta_{z}^{2}}{\beta^2} \end{pmatrix} \left( \begin{array}{c} ct'\\ x'\\ y'\\ z'\end{array}\right)}

and they can be written as \mathbb{X}=\mathbb{L}^{-1}\mathbb{X'}, or using tensor notation

x^\rho=(\Lambda^{-1})^\rho_{\;\mu'} x^{\mu'}

in such a way that

x^{\mu'} = \Lambda^{\mu}_{\; \nu} x^\nu \rightarrow (\Lambda^{-1})^{\rho}_{\; \mu'}x^{\mu'} = (\Lambda^{-1})^{\rho}_{\;\mu'}(\Lambda)^{\mu'}_{\;\nu} x^{\nu} = x^{\rho} = \delta ^{\rho}_{\; \nu}x^\nu

Thus, (\Lambda^{-1})^{\rho}_{\;\mu'}(\Lambda)^{\mu'}_{\;\nu} = \delta ^{\rho}_{\; \nu}

or equivalently \mathbb{L}^{-1}\mathbb{L}=\mathbb{L}\mathbb{L}^{-1}=\mathbb{I}.

\delta ^{\rho}_{\; \nu} is the “unity” tensor, also called Kronecker delta, meaning that its components are 1 if \rho = \nu and 0 otherwise (if \rho \neq \nu). The Kronecker delta is therefore the “unit” tensor with two indexes.

NOTATIONAL CAUTION: Be aware, some books and people use to assume you know when you need the matrix \mathbb{L} or its inverse \mathbb{L}^{-1}. Thus, you will often read and see this

 x^{\mu'}=\Lambda^{\mu'}_{\;\nu} x^\nu \rightarrow x^\nu= \Lambda^\nu_{\;\mu'} x^{\mu'}

where certain abuse of language since it implies that

\Lambda^\nu_{\;\mu'} = (\Lambda^{-1})^{\mu'}_{\;\nu}

and because we have  to be mathematically consistent, the following  relationship is required to hold

\Lambda^\nu_{\;\mu'}\Lambda^{\mu'}_{\;\nu}=1

or more precisely, taking care with the so-called free indexes

\Lambda^\rho_{\;\mu'} \Lambda^{\mu'}_{\;\sigma}=\delta^\rho_\sigma

as before.

 

LOG#005. Lorentz transformations(I).

LorTransformations

For physicists working with objects approaching the light speed, e.g., handling with electromagnetic waves, the use of special relativity is essential.

The special theory of relativity is based on two single postulates:

1st) Covariance or invariance of all the physical laws (Mechanics, Electromagnetism,…) for all the inertial observers ( i.e. those moving with constant velocity relative to each other).

It means that there is no preferent frame or observer, only “relative” motion is meaningful when speaking of motion with respect to certain observer or frame. Indeed, unfortunately, it generated a misnomer and a misconception in “popular” Physics when talking about relativity (“Everything is relative”). What is relative then? The relative motion between inertial observers and its description using certain “coordinates” or “reference frames”. However, the true “relativity” theory introduces just about the opposite view. Physical laws and principles are “invariant” and “universal” (not relative!).  Einstein himself was aware of this, in spite he contributed to the initial spreading of the name “special relativity”, understood as a more general galilean invariance that contains itself the electromagnetic phenomena derived from Maxwell’s equations.

2nd) The speed of light is independent of the source motion or the observers. Equivalently, the speed of light is constant everywhere in the Universe.

No matter how much you can run, speed of light is universal and invariant. Massive particles can never move at speed of light. Two beams of light approaching to each other does not exceed the speed of light either. Then, the usual rule for the addition of velocities is not exact. Special relativity provides the new rule for adding velocities.

In this post, the first of a whole thread devoted to special relativity, I will review one of the easiest ways to derive the Lorentz transformations. There are many ways to “guess” them, but I think it is very important to keep the mathematics as simple as possible. And here simple means basic (undergraduate) Algebra and some basic Physics concepts from electromagnetism, galilean physics and the use of reference frames. Also, we will limite here to 1D motion in the x-direction.

Let me begin! We assume we have two different observers and frames, denoted by S and S’. The observer in S is at rest while the observer in S’ is moving at speed v with respect to S. Classical Physics laws are invariant under the galilean group of transformations:

x'=x-vt

We know that Maxwell equations for electromagnetic waves are not invariant under Galileo transformations, so we have to search for some deformation and generalization of the above galilean invariance. This general and “special” transformation will reduce to galilean transformations whenever the velocity is tiny compared with the speed of light (electromagnetic waves). Mathematically speaking, we are searching for transformations:

x'=\gamma (x-vt)

and

x=\gamma (x'+vt')

for the inverse transformation. There \gamma=\gamma(c,v) is a function of the speed of light (denoted as c, and constant in every frame!) and the relative velocity v of the moving object in S’ with respect to S. The small velocity limit of special relativity to galilean relativity imposes the condition:

\displaystyle{\lim_{v \to 0} \gamma (c,v) =1}

By the other hand, according to special relativity second postulate, light speed is constant in every reference frame. Therefore, the distance a light beam ( or wave packet) travels in every frame is:

x=ct in S, or equivalently x^2=c^2t^2

and

x'=ct' in S’, or equivalently x'^2=c^2t'^2

Then, the squared spacial  separation between the moving light-like object at S’ with respect to S will be

x^2-x'^2=c^2(t^2-t'^2)

Squaring the modified galilean transformations, we obtain:

x'^2=\gamma ^2(x-vt)^2 \rightarrow x'^2=\gamma ^2 (x^2+v^2t^2-2xvt) \rightarrow x'^2-\gamma ^2x^2+2\gamma ^2xvt=\gamma ^2 v^2t^2

x^2=\gamma ^2 (x'+vt')^2 \rightarrow x^2-\gamma ^2x'^2-2\gamma ^2x'vt'=\gamma ^2v^2t'^2

The only “weird” term in the above last two equations are the mixed term with “xvt” (or the x’vt’ term). So, we have to make some tricky algebraic thing to change it. Fortunately for us, we do know that x'=\gamma(x-vt), so

x'=\gamma x -\gamma vt \rightarrow \gamma x'=\gamma ^2 x-\gamma ^2 vt \rightarrow \gamma xx'=\gamma ^2 x^2-\gamma ^2 xvt

and thus

2\gamma xx'=2 \gamma ^2 x^2-2\gamma ^2 xvt \rightarrow 2\gamma ^2 xvt =2\gamma ^2x^2-2\gamma xx'

In the same way,  we proceed with the inverse transformations:

x=\gamma x'+\gamma vt' \rightarrow \gamma x=\gamma ^2x'+\gamma ^2vt' \rightarrow \gamma xx'=\gamma ^2x'^2-\gamma ^2x'vt'

and thus

2\gamma xx'=2\gamma^2x'^2+2\gamma^2x'vt' \rightarrow 2\gamma^2x'vt'=2\gamma^2x'^2-2\gamma^2xx'

We got it! We can know susbtitute the mixed x-v-t and x’-v-t’ triple terms in terms of the last expressions. In this way, we get the following equations:

x'^2=\gamma ^2(x-vt)^2 \rightarrow x'^2=\gamma ^2(x^2+v^2t^2-2xvt) \rightarrow x'^2-\gamma ^2x^2+2\gamma ^2x^2-2\gamma ^2xx'=\gamma ^2v^2t^2 \rightarrow x'^2+\gamma ^2x^2-2\gamma ^2xx'=\gamma ^2v^2t^2

x'^2=\gamma ^2(x'+vt')^2 \rightarrow x^2=\gamma ^2(x'^2+v^2t^2+2x'vt') \rightarrow x^2-\gamma ^2x'^2+2\gamma ^2x'^2-2\gamma ^2xx'=\gamma ^2v^2t'^2 \rightarrow x^2+\gamma ^2x'^2-2\gamma ^2xx'=\gamma ^2v^2t'^2

And now, the final stage! We substract the first equation to the second one in the above last equations:

x^2-x'^2+\gamma ^2(x'^2-x^2)=\gamma ^2v^2(t'^2-t^2) \rightarrow (x'^2-x^2)(\gamma ^2-1)= \gamma ^2v^2(t'^2-t^2)

But we know that x^2-x'^2=c^2(t^2-t'^2), and so

(x'^2-x^2)(\gamma ^2-1)= \gamma ^2v^2(t'^2-t^2) \rightarrow c^2(x'^2-x^2)(\gamma ^2-1)= \gamma ^2v^2(x'^2-x^2)

then

c^2(\gamma ^2-1)= \gamma ^2v^2 \rightarrow -c^2= -\gamma ^2c^2+\gamma ^2v^2 \rightarrow \gamma ^2=\dfrac{c^2}{c^2-v^2}

or, more commonly we write:

\gamma ^2=\dfrac{1}{1-\dfrac{v^2}{c^2}}

and therefore

\gamma =\dfrac{1}{\sqrt{1-\dfrac{v^2}{c^2}}}

Moreover, we usually define the beta (or boost) parameter to be

\beta = \dfrac{v}{c}

To obtain the time transformation we only remember that x'=ct' and x=ct for light signals, so then, for time we  obtain:

x'=\gamma (x-vt) \rightarrow t' =x' /c= \gamma (x/c-vt/c)=\gamma ( t- vx/c^2)

Finally, we put everything together to define the Lorentz transformations and their inverse for 1D motion along the x-axis:

x'=\gamma (x-vt)

y'=y

z'=z

t'=\gamma \left( t-\dfrac{vx}{c^2}\right)

and for the inverse transformations

x=\gamma (x'+vt)

y=y'

z=z'

t=\gamma \left( t'+\dfrac{vx'}{c^2}\right)

ADDENDUM: THE EASIEST, FASTEST AND SIMPLEST DEDUCTION  of \gamma (that I do know).

If you don’t like those long calculations, there is a trick to simplify the “derivation” above.  The principle of Galilean relativity enlarged for electromagnetic phenomena implies the structure:

x'=\gamma (x-vt)

and

x=\gamma (x'+vt')

for the inverse.

Now, the second postulate of special relativity says that light signals travel in such a way light speed in vacuum is constant, so t=x/c and t'=x'/c. Inserting these times in the last two equations:

x'=\gamma (1-v/c)x

and

x=\gamma (1+v/c)x'

Multiplying these two equations, we get:

x'x =\gamma ^2(1+v/c)(1-v/c)xx'.

If we consider any event beyond the initial tic-tac, i.e., if we suppose t\neq 0 and t'\neq 0, the product xx' will be different from zero, and we can cancel the factors on both sides to get what we know and expect:

\gamma^2(1-v^2/c^2)=1

i.e.

\gamma = \dfrac{1}{\sqrt{1-\dfrac{v^2}{c^2}}}

LOG#004. Feynmanity.

feynman_cern2

The dream of every theoretical physicist, perhaps the most ancient dream of every scientist, is to reduce the Universe ( or the Polyverse if you believe we live in a Polyverse, also called Multiverse by some quantum theorists) to a single set of principles and/or equations. Principles should be intuitive and meaningful, while equations should be as simple as possible but no simpler to describe every possible phenomenon in the Universe/Polyverse.

What is the most fundamental equation?What is the equation of everything? Does it exist? Indeed, this question was already formulated by Feynman himself  in his wonderful Lectures on Physics! Long ago, Feynman gave us other example of his physical and intuitive mind facing the First Question in Physics (and no, the First Question is NOT “(…)Dr.Who?(…)” despite many Doctors have faced it in different moments of the Human History).

Today, we will travel through this old issue and the modest but yet intriguing and fascinating answer (perhaps too simple and general) that R.P. Feynman found.

Well, how is it?What is the equation of the Universe? Feynman idea is indeed very simple. A nullity condition! I call this action a Feynman nullity, or feynmanity ( a portmanteau), for brief. The Feynman equation for the Universe is a feynmanity:

\boxed{U=0}

Impressed?Indeed, it is very simple. What is the problem then?As Feynman himself said, the problem is really a question of “order” and a “relational” one. A question of what theoretical physicists call “unification”. No matter you can put equations together, when they are related, they “mix” somehow through the suitable mathematical structures.  Gluing “different” pieces and objects is not easy.  I mean, if you pick every equation together and recast them as feynmanities, you will realize that there is no relation a priori between them. However, it can not be so in a truly unified theory. Think about electromagnetism. In 3 dimensions, we have 4 laws written in vectorial form, plus the gauge condition and electric charge conservation through a current. However, in 4D you realize that they are indeed more simple. The 4D viewpoint helps to understand electric and magnetic fields as the the two sides of a same “coin” (the coin is a tensor). And thus, you can see the origin of the electric and magnetic fields through the Faraday-Maxwell tensor F_{\mu \nu }. Therefore, a higher dimensional picture simplifies equations (something that it has been remarked by physicists like Michio Kaku or Edward Witten) and helps you to understand the electric and magnetic field origin from a second rank tensor on equal footing.

You can take every equation describing the Universe set it equal to zero. But of course, it does not explain the origin of the Universe (if any), the quantum gravity (yet to be discovered) or whatever. However, the remarkable fact is that every important equation can be recasted as a Feynmanity! Let me put some simple examples:

Example 1. The Euler equation in Mathematics. The most famous formula in complex analysis is a Feynmanity e^{i\pi}+1=0 or e^{2\pi i}=1+0 if you prefer the constant \tau=2\pi.

Example 2. The Riemann’s hypothesis. The most important unsolved problem in Mathematics(and number theory, Physics?) is the solution to the equation \zeta (s)=0, where \zeta(s) is the celebrated riemann zeta function in complex variable s=\kappa + i \lambda, \kappa, \lambda \in \mathbb{R}. Trivial zeroes are placed in the real axis s=-2n \forall n=1,2,3,...,\infty. Riemann hypothesis is the statement that every non-trivial zero of the Riemann zeta function is placed parallel to the imaginary axis and they have all real part equal to 1/2. That is, Riemann hypothesis says that the feynmanity \zeta(s)=0 has non-trivial solutions iff s=1/2\pm i\lambda _n, \forall n=1,2,3,...,\infty, so that

\displaystyle{\lambda_{1}=14.134725, \lambda_{2}= 21.022040, \lambda_{3}=25.010858, \lambda _{4}=30.424876, \lambda_{5}=32.935062, ...}

I generally prefer to write the Riemann hypothesis in a more symmetrical and “projective” form. Non-trivial zeroes have the form s_n=\dfrac{1\pm i \gamma _n}{2} so that for me, non-trivial true zeroes are derived from projective-like operators \hat{P}_n=\dfrac{1\pm i\hat{\gamma} _n}{2}, \forall n=1,2,3,...,\infty. Thus

\gamma_1 =28.269450, \gamma_2= 42.044080, \gamma_3=50.021216, \gamma _4=60.849752, \gamma_5=65.870124,...

Example 3. Maxwell equations in special relativity. Maxwell equations have been formulated in many different ways along the history of Physics. Here a picture of that. Using tensor calculus, they can be written as 2 equations:

\partial _\mu F^{\mu \nu}-j^\nu=0

and

\epsilon ^{\sigma \tau \mu \nu} \partial _\tau F_{\mu\nu}=\partial _\tau F_{\mu \nu}+ \partial _\nu F_{\tau \mu}+\partial_\mu F_{\nu \tau}=0

Using differential forms:

dF=0

and

d\star F-J=0

Using Clifford algebra (Clifford calculus/geometric algebra, although some people prefer to talk about the “Kähler form” of Maxwell equations) Maxwell equations are a single equation: \nabla F-J=0 where the geometric product is defined as \nabla F=\nabla \cdot F+ \nabla \wedge F.

Indeed, in the Lorentz gauge  \partial_\mu A^\mu=0, the Maxwell equations reduce to the spin one field equations:

\square ^2 A^\nu=0

where we defined

\square ^2=\square \cdot \square = \partial_\mu \partial ^\mu =\dfrac{\partial^2}{\partial x^i \partial x_i}-\dfrac{\partial ^2}{c^2\partial t^2}

Example 4. Yang-Mills equations. The non-abelian generalization of electromagnetism can be also described by 2 feynmanities:

The current equation for YM fields is (D^{\mu}F_{\mu \nu})^a-J_\nu^a=0

The Bianchi identities are (D _\tau F_{\mu \nu})^a+( D _\nu F_{\tau \mu})^a+(D_\mu F_{\nu \tau})^a=0

Example 5. Noether’s theorems for rigid and local symmetries. Emmy Noether proved that when a r-paramatric Lie group leaves the lagrangian quasiinvariant and the action invariant, a global conservation law (or first integral of motion) follows. It can be summarized as:

D_iJ^i=0 for suitable (generally differential) operators D^i,J^i depending on the particular lagrangian (or lagrangian density) and \forall i=1,...,r.

Moreover, she proved another theorem. The second Noether’s theorem applies to infinite-dimensional Lie groups. When the lagrangian is invariant (quasiinvariant is more precise) and the action is invariant under the infinite-dimensional Lie group parametrized by some set of arbitrary (gauge) functions ( gauge transformations), then some identities between the equations of motion follow. They are called Noether identities and take the form:

\dfrac{\delta S}{\delta \phi ^i}N^i_\alpha=0

where the gauge transformations are defined locally as

\delta \phi ^i= N^i_\alpha \epsilon ^\alpha

with N^i_\alpha certain differential operators depending on the fields and their derivatives up to certain order. Noether theorem’s are so general that can be easily generalized for groups more general than those of Lie type. For instance, Noether’s theorem for superymmetric theories (involving lie “supergroups”) and many other more general transformations can be easily built. That is one of the reasons theoretical physicists love Noether’s theorems. They are fully general.

Example 6. Euler-Lagrange equations for a variational principle in Dynamics take the form \hat{E}(L)=0, where L is the lagrangian (for a particle or system of particles and \hat{E}(L) is the so-called Euler operator for the considered physical system, i.e., if we have finite degrees of freedom, L is a lagrangian) and a lagrangian “density” in the more general “field” theory framework( where we have infinite degrees of freedom and then L is a lagrangian density \mathcal{L}. Even the classical (and quantum) theory of (super)string theory follows from a lagrangian (or more precisely, a lagrangian density). Classical actions for extended objects do exist, so it does their “lagrangians”. Quantum theory for p-branes p=2,3,... is not yet built but it surely exists, like M-theory, whatever it is.

Example 7.  The variational approach to Dynamics or Physics implies  a minimum ( or more generally a “stationary”) condition for the action. Then the feynmanity for the variational approach to Dynamics is simply \delta S=0. Every known fundamental force can be described through a variational principle.

Example 8. The Schrödinger’s equation in Quantum Mechanics H\Psi-E\Psi=0, for certain hamiltonian operator H. Note that the feynmanity is itself H=0 when we studied special relativity from the hamiltonian formalism. Even more, in Loop Quantum Gravity, one important unsolved problem is the solution to the general hamiltonian constraint for the gauge “Wilson-like” loop variables, \hat{H}=0.

Example 9. The Dirac’s equation (i\gamma ^\mu \partial_\mu - m) \Psi =0 describing free spin 1/2 fields. It can be also easily generalized to interacting fields and even curved space-time backgrounds. Dirac equation admits a natural extension when the spinor is a neutral particle and it is its own antiparticle through the Majorana equation

i\gamma^\mu\partial_\mu \Psi -m\Psi_c=0

Example 10. Klein-Gordon’s equation for spin 0 particles: (\square ^2 +m^2 )\phi=0.

Example 11. Rarita-Schwinger spin 3/2 field equation: \gamma ^{\mu \nu \sigma}\partial_{\nu}\Psi_\sigma+m\gamma^{\mu\nu}\Psi_\nu=0. If m=0 and the general conventions for gamma matrices, it can be also alternatively writen as

\gamma ^\mu (\partial _\mu \Psi_\nu -\partial_\nu\Psi_\mu)=0

Note that antisymmetric gamma matrices verify:

\gamma ^{\mu \nu}\partial_{\mu}\Psi_\nu=0

More generally, every local (and non-local) field theory equation for spin s can be written as a feynmanity or even a theory which contains interacting fields of different spin( s=0,1/2,1,3/2,2,…).  Thus, field equations have a general structure of feynmanity(even with interactions and a potential energy U) and they are given by \Lambda (\Psi)=0, where I don’t write the indices explicitely). I will not discuss here about the quantum and classical consistency of higher spin field theories (like those existing in Vasiliev’s theory) but field equations for arbitrary spin fields can be built!

Example 12. SUSY charges. Supersymmetry charges can be considered as operators that satisfy the condicion \hat{Q}^2=0 and \hat{Q}^{\dagger 2}=0. Note that Grassman numbers, also called grassmanian variables (or anticommuting c-numbers) are “numbers” satisfying \theta ^2=0 and \bar{\theta}^2=0.

The Feynman’s conjecture that everything in a fundamental theory can be recasted as a feynmanity seems very general, perhaps even silly, but  it is quite accurate for the current state of Physics, and in spite of the fact that the list of equations can be seen unordered of unrelated, the simplicity of the general feynmanity (other of the relatively unknown neverending Feynman contributions to Physics)

something =0

is so great that it likely will remain forever in the future of Physics. Mathematics is so elegant and general that the Feynmanity will survive further advances unless  a  Feynman “inequality” (that we could call perhaps, unfeynmanity?) shows to be more important and fundamental than an identity. Of course, there are many important results in Physics, like the uncertainty principle or the second law of thermodynamics that are not feynmanities (since they are inequalities).

Do you know more examples of important feynmanities?

Do you know any other fundamental physical laws or principles that can not be expressed as feynmanities, and then, they are important unfeynmanities?


 

 



LOG#003. Entropy.

Boltzmann's grave and the entropy from his Statistical Mechanics

“I propose to name the quantity S the entropy of the system, after the Greek word [τροπη trope], the transformation. I have deliberately chosen the word entropy to be as similar as possible to the word energy: the two quantities to be named by these words are so closely related in physical significance that a certain similarity in their names appears to be appropriate.”  Clausius (1865).

Entropy is one of the strangest and wonderful concepts in Physics. It is essential for the whole Thermodynamics and it is essential also to understand thermal machines. It is essential for Statistical Mechanics and the atomic structure of molecules and fundamental particles. From the Microcosmos to the Macrocosmos, entropy is everywhere: from  the kinetic theory of gases, information theory as we learned from the previous post, and it is also relevant in the realm of General Relativity, where equations of state for relativistic and non-relativistic particles arise too. And even more, entropy arises in the Black Hole Thermodynamics in a most mysterious form that nobody understands yet.

By the other hand, in the Quantum Mechanics, entropy arises in the (Von Neumann’s) approach to density matrix, the quantum incarnation of the classical version of entropy, ultimately related to the notion of  quantum entanglement. I have no knowledge of any other concept in Physics that can appear in such diffent branches of Physics. The true power of the concept of entropy is its generality.

There are generally three foundations for entropy, three roads to the entropy meaning that physicists have:

– Thermodynamical Entropy. In Thermodynamics, entropy arises after integrating out the heat with an integrating factor that is nothing but the inverse of the temperature. That is:

\boxed{dS=\oint_\gamma\dfrac{\delta Q}{T}\rightarrow \Delta S= \dfrac{\Delta Q}{T}}

The studies of thermal machines that existed as logical consequence of the Industrial Revolution during the XIX century created the first definition of entropy. Indeed, following Clausius, the entropy change \Delta S of a thermodynamic system absorbing a quantity of heat \Delta Q  at absolute temperature T is simply the ratio between the two, as the above formula shows!  Armed with this definition and concept, Clausius was able to recast Carnot’s statement that steam engines can not exceed a specific theoretical optimum efficiency into a much grander principle we do know as the “2nd law of Thermodynamics” (sometimes called The Maximum Entropy, MAX-ENT, principle by other authors)

\boxed{\mbox{The entropy of the Universe tends to a maximum}}

The problem with this definition and this principle is that  it leaves unanswered the most important questionwhat really is the meaning of entropy? Indeed, the answer to this question had to await the revival of  atomic theories of the matter at the end of the 19th century.

– Statistical Entropy.  Ludwig Boltzmann was the scientist who provided a fundamental theoretical basis to the concept of entropy. His key observation was that absolute temperature is nothing more than the average energy per molecular degree of freedom. This fact strongly implies that Clausius ratio between absorbed energy and absolute temperature is nothing more than the number of molecular degrees of freedom. That is, Boltzmann greatest idea was indeed very simply put into words:

\boxed{S=\mbox{Number of microscopical degrees of freedom}= N_{dof}}

We can see a difference with respect to the thermodynamical picture of entropy: Boltzmann was able to show that the number of degrees of freedom of a physical system can be easily linked to the number of microstates \Omega of that system. And it comes with a relatively simple expression from the mathematical viewpoint (using the 7th elementary arithmetical operation, beyond the more known addition, substraction, multiplication, division, powers, roots,…)

\boxed{S \propto \log \Omega}

Really the base of the logarithm is absolutely conventional. Generally, it is used the natural base (or the binary base, see below).

Why does it work? Why is the number of degrees of freedom related to the logarithm of the total number of available microscopical states? Imagine a system with 2 simple degrees of freedom, a coin. Clone/copy it up to N of those systems. Then, we have got a system of N coins  showing head or tail. Each coin contributes one degree of freedom that can take two distinct values. So in total we have N (binary, i.e., head or tail) degrees of freedom. Simple counting tells us that each coin (each degree of freedom) contributes a factor of two to the total number of distinct states the system can be in. In other words, \Omega = 2^N.  Taking the base-2 logarithm  of both sides of this equation yields the logarithm of the total number of states to equal the number of degrees of freedom: \log_2 \Omega = N.

This argument can be made completely general. The key argument is that the total number of states  \Omega follows from multiplying together the number of states for each degree of freedom. By taking the logarithm of  \Omega, this product gets transformed into an addition of degrees of freedom. The result is an additive entropy definition: adding up the entropies of two independent subsystems provides us the entropy of the total system.

– Information Entropy.
Time machine towards the past future. 20th century. In 1948, Claude Shannon, an electrical engineer at Bell Telephone Laboratories, managed to mathematically quantify the concept of “information”. The key result he derived is that to describe the precise state of a system that can be in states labelled by numbers 1,2,...,n with probabilities p_1, p_2,...,p_n.

It requires a well-defined minimum number of bits. In fact, the best one can do is to assign \log_2 (1/p_i) bits to the one event with state i. Result:  statistically speaking the minimum number of bits one needs to be capable of specifying the system regardless its precise state will be

\displaystyle{\mbox{Minimum number of bits} = \sum_{i=i}^{n}p_i\log_2 p_i = p_1\log_2 p_1+p_2\log_2 p_2+...+p_n\log_2 p_n}

When applied to a system that can be in \Omega states, each with equal  probability p= 1/\Omega, we get that

\mbox{Minimum number of bits} = \log_2 \Omega

We got it. A full century after the thermodynamic and statistical research we were lead to the simple conclusion that the Boltzmann expression S = \log \Omega is nothing more than an alternative way to express:

S = \mbox{number of bits required to define some (sub)system}

Entropy is therefore a simple bit (or trit, cuatrit, pentit,…,p-it) counting of your system. The number of bits required to completely determine the actual microscopic configuration between the total number of microstates allowed. In these terms the second law of thermodynamics tells us that closed systems tend to be characterized by a growing bit count. Does it work? Yes, it does. Very well as far as we know…Even in quantum information theory you have an analogue with the density matrix. Even it works in GR and even it strangely works with Black Hole Thermodynamics, excepting the fact that entropy is the area of the horizon, temperature is the surface gravity in the horizon, and the fact that mysteriously, BH entropy is proportional not to the volume as one could expect from conventional thermodynamics (where entropy scales as the volume of the container) , but to the area of the horizon. Incredible, isn’t it? That scaling of the Black Hole entropy with the area was the origin of the holographic principle. But it is far away where my current post wants to go today.

Indeed, there is  a subtle difference between the statistical and the informational entropy. A sign minus in the definition. (Thermodynamical) Entropy can be understood as “missing” (information) entropy:

\boxed{Entropy = - Information}

or mathematically

S= - I , do you prefer maybe I+S=0?

That is, entropy is the same thing that information, excepting a minus sign! So, if you add the same thing to its opposite you get zero.

The question that naturally we face in this entry is the following one: what is the most general mathematical formula/equation for “microscopic” entropy? Well, as many others great problems in Physics, it depends on the axiomatics and your assumptions! Let’s follow Boltzmann during the XIX century. He cleverly suggested a deep connection between thermodynamical entropy and the microscopical degrees of freedom of the considered system. He suggested that there were a connection between the entropy S of a thermodynamical system and the probability \Omega of a given thermodynamical state. How can the functional relationship between S and \Omega be found? Suppose we have S=f(\Omega). In addition, suppose that we have a system that can be divided into two pieces, with their respective entropies and probabilities S_1,S_2,\Omega_1,\Omega_2. If we assume that the entropy is additive, meaning that

S_\Omega=S_1(\Omega_1)+S_2(\Omega_2)

with the additional hypothesis that the sybsystems are independent, i.e., \Omega=\Omega_1\Omega_2, then we can fix the functional form of the entropy in a very easy way: S(\Omega)=f(\Omega_1\Omega_2)=f(\Omega_1)+f(\Omega_2). Do you recognize this functional equation from your High-School? Yes! Logarithms are involved with it. If you are not convinced, with simple calculus, following the Fermi lectures on Thermodynamics you can do the work. Let be x=\Omega_1, y=\Omega_2

f(xy)=f(x)+f(y)

Write now y=1+\epsilon , then f(x+\epsilon x)=f(x)+f(1+\epsilon), where \epsilon is a tiny infinitesimal quantity of first order. Thus, Taylor expanding to both sides, neglecting terms higher to first order infinitesimals, we get

f(x)+\epsilon f'(x)=f(x)+f(1)+\epsilon f'(1)

For \epsilon=0 we obtain f(1)=0, and therefore xf'(x)=f'(1)=k=constant, where k is a constant, and nowadays it is  called Boltzmann’s constant. We integrate the differential equation:

f'(x)=k/x in order to obtain the celebrated Boltzmann’s equation for entropy: S=k\log \Omega. To be precise, \Omega is not the probability, is the number of microstates compatible with the given thermodynamical state. To obtain the so-called Shannon-Gibbs-Boltzmann entropy, we must divide \Omega between the number of possible dynamical states that agree with the microstate.The Shannon entropy functional form is then generally written as follows:

\displaystyle{\boxed{S=-k \sum_i p_i\log p_i}}

It approaches a maximum value when p_i=1/\Omega, i.e., when the probability\Omega is a uniform distribution. There is a subtle issue related to the additive constant obtained from the above argument that is important in classical and quantum thermodynamics. But we will discuss that in the future. Now, we could be happy with this functional entropy but indeed, the real issue is that we derived it from some a priori axioms that could look natural, but they are not the most general set of axioms. And, then,  our fascinating trip continues here today! There previous considerations have been using, more or less, formal according to the so-called  “Khinchin axioms” of information theory. That is, The Khinchin axioms are enough to derive the Shannon-Gibbs-Boltzmann entropy we wrote before. However, as it happened with the axioms of euclidean geometry, we can modify our axioms in order to obtain more general “geometries”, here more general “statistical mechanics”. We are going now to explore some of the most known generalizations to Shannon entropy.In the succesive, for simplicity, we set the Boltzmann’s constant to one (i.e. we work with a k=1 system of units ). Is the above definition of entropy/information the only one that is interesting from the physical viewpoint? No, indeed, there has been an increasing activity in “generalized entropies” in the past years. Note, however, that we should recover the basic and simpler entropy (that of Shannon-Gibbs-Boltzmann) in some limit. I will review here some of the most studied entropic functionals that have been studied during the last decades.

The Rényi entropy.

It is a set of uniparametric entropies, now becoming more and more popular in works on entanglement and thermodynamics, with the following functional form:

\displaystyle{ \boxed{S_q^R=\dfrac{1}{1-q}\ln \sum_i p_{i}^{q}}}

where the sums extends itself to any microstate with non zero probability p_i. It is quite easy to see that in the limit q\rightarrow 1 the Rényi entropy transforms into the Shannon-Gibbs-Boltzmann entropy (it can be checked with a perturbative expansion around q=1+\epsilon or using the L’Hôspital’s rule.

The Tsallis entropy.


Tsallis entropies, also called q-entropies by some researchers, are the uniparametric family of entropies defined by:

\displaystyle{ \boxed{S_{q}^{T}=\dfrac{1}{1-q}\left( \sum_{i} p_{i}^{q}-1\right)}}.

Tsallis entropy is related to Rényi’s entropies throug a nice equation:

\boxed{S_q^ T=\dfrac{1}{q-1}(1-e^{(q-1)S_q^R})}

and again, taking the limit q=1, Tsallis entropies provide the Shannon-Gibbs-Boltzmann’s entropies. Why consider such a Statistical Mechanics based on Tsallis entropy and not Renyi’s?Without entrying into mathematical details, the properties of Tsallis entropy makes itself more suitable to a generalized Statistical Mechanics for complex systems(in particular, it is due to the concavity of Tsallis entropy), as the seminal work of C.Tsallis showed. Indeed, Tsallis entropies were found unnoticed by Tsallis in other unexpected place. In a paper, Havrda-Charvat introduced the so-called “structural \alpha entropy” related to some cybernetical problems in computing.

Interestingly, Tsallis entropies are non-additive, meaning that they satisfy a “pseudo-additivity” property:

\boxed{S_{q}^{\Omega}=S_q^{\Omega_1}+S_q^{\Omega_2}-(q-1)S_q^{\Omega_1}S_q^{\Omega_2}}

This means that if we built a Statistical Mechanics based on the Tsallis entropy, it is non-additive itself. Independent subsystems are generally non-additive. However, they are usually called “non-extensive” entropies. Why? The definition of extensivity is  different, namely the entropy of a given system is extensive if, in the so called thermodynamicla limit N\rightarrow \infty, then S\propto N , where N is the number of elements of the given thermodynamical system. Therefore, the additivity only depends on the functional relation between the entropy and the probabilities, but extensivity depends not only on that, but also on the nature of the correlations between the elements of the system. The entropic additivity test is quite trivial, but checking its extensivity for a specific system can be complex and very complicated. Indeed, Tsallis entropies can be additive for certain systems, and for some correlated systems, they can become extensive, like usual Thermodynamics/Statistical Mechanics. However, in the broader sense, they are generally non-additive and non-extensive. And it is the latter feature, its thermodynamical behaviour in the thermodynamical limit, from where the name “non-extensive” Thermodynamics arises.

Landsberg-Vedral entropy.

They are also called “normalized Tsallis entropies”. Their functional form are the uniparametric family of entropies:

\displaystyle{ \boxed{S_q^{LV} =\dfrac{1}{1-q} \left( 1-\dfrac{1}{\sum_i p_{i}^{q}}\right)}}

They are related to Tsallis entropy through the equation:

\displaystyle{ S_q^{LV}= \dfrac{S_q^T}{\sum_i p_i ^q}}

It explains their alternate name as “normalized” Tsallis entropies. They satisfy a modified “pseudoadditivity” property:

S_q^\Omega=S_q^{\Omega_1}+S_q^{\Omega_2}+(q-1)S_q^{\Omega_1}S_q^{\Omega_2}

That is, in the case of normalized Tsallis entropies the rôle of (q-1) and -(q-1) is exchanged, i.e., -(q-1) becomes (q-1) in the transition from Tsallis to Landsberg-Vegral entropy.

Abe entropy.

This kind of uniparametric entropy is very symmetric. It is also related to some issues in quantum groups and fractal (non-integer) analysis. They are defined by the q-1/q entropic functional:

\displaystyle{ \boxed{S_q^{Abe}=-\sum_i \dfrac{p_i^q-p_i^{q^{-1}}}{q-q^{-1}}}}

Abe entropy can be obtained from Tsallis entropy as follows:

\boxed{S_q^{ LV}=\dfrac{(q-1)S_q^T-(q^{-1}-1)S_{q^{-1}}^{T}}{q-q^{-1}}}

Abe entropy is also concave from the mathematical viewpoint, like the Tsallis entropy. It has some king of “duality” or mirror symmetry due to the invariance swapping q and 1/q.

Kaniadakis entropy

Other uniparametric entropic family well-know is the Kaniadakis entropy or $latex  \kappa $-entropy. Related to relativistic kinematics, it has the functional form:

\displaystyle{ \boxed{S_\kappa^{K}=-\sum_i \dfrac{p_i^{1+\kappa }-p_i^{1-\kappa}}{2\kappa}}}

In the limit \kappa \rightarrow 0 Kaniadakis entropy becomes Shannon entropy. Also, writing q=1+\kappa, and \dfrac{1}{q}=1-\kappa, Kaniadakis entropy becomes Abe entropy. Kaniadakis entropy, in addition to be concave, have further subtle properties, like being something called Lesche stable. See references below for details!

Sharma-Mittal entropies.

Finally, we end our tour along entropy functionals with a biparametric family of entropies called Sharma-Mittal entropies. They have the following definition:

\displaystyle{ \boxed{S_{\kappa,r}^{SM}=-\sum_i p_i^{r}\left( \dfrac{p_i^{1+\kappa}-p_i^{1-\kappa}}{2\kappa}\right)}}

It can be shown that these entropy species contain many entropies as special subtypes. For instance, Tsallis entropy is recovered if r=\kappa and q=1-2\kappa. Kaniadakis entropy is got if we set r=0. Abe entropy is the subcase with \kappa=\frac{1}{2}(q-q^{-1}) and r=\frac{1}{2}(q+q^{-1})-1. Isn’t it wonderful? There is an alternative expression of Sharma-Mittal entropy, taking the following expression:

       \displaystyle{ \boxed{S_{r,q}^{SM}=\dfrac{1}{1-r}\left[\sum_i \left(p_i^q\right)^{(\frac{1-r}{1-q})}-1\right]}}

In this functional form, SM entropy recovers Rényi entropy for r\rightarrow 1, SM entropy becomes Tsallis entropy if r\rightarrow q. Finally, when both parameters approach 1, i.e., r,q\rightarrow 1, we recover the classical Shannon-Gibbs-Boltzmann. It is left as a nice exercise for the reader to relate the above 2 SM entropy functional forms and to derive Kaniadakis entropy, Abe entropy and Landsberg-Vedral entropy for some particular values of r,q from the second definition of SM entropy.

However, entropy as a concept is yet very mysterious. Indeed, it is not clear yet if we have exhausted every functional form for entropy!

Non-extensive Statistical Mechanics and its applications are becoming more and more important and kwown between the theoretical physicists. It has a growing number of uses in High-Energy Physics, condensed matter, Quantum Information and Physics. The Nobel Prize Murray Gell-Mann has dedicated their last years of research in the world of Non-Extensive entropy. At least, from his book The Quark and the Jaguar, Murray Gell-Mann has progressively moved into this fascinating topic. In parallel, it has also produced some other interesting approaches to Statistical Mechanics, such as the so-called “superstatistics”. Superstatistics is some kind of superposition of statistics that was invented by the physicist Christian Beck.

The last research on the foundations of entropy functionals is related to something called “group entropies” and the transformation group of superstatistics and the rôle of group transformations on non-extensive entropies. It provides feedback between different branches of knowledge: group theory, number theory, Statistical Mechanics, and Quantum Satisties…And a connection with the classical Riemann zeta function even arises!

WHERE DO I LEARN ABOUT THIS STUFF and MORE if I am interested in it? You can study these topics in the following references:

The main entry is based in the following article by Christian Beck:

1) Generalized information and entropy measures in physics by Christian Beck. http://arXiv.org/abs/0902.1235v2

If you get interested in Murray Gell-Mann works about superstatistics and its group of transformations, here is the place to begin with:

2) Generalized entropies and the transformation group of superstatistics. Rudolf Haner, Stefan Thurner, Murray Gell-Mann

http://arxiv.org/abs/1103.0580

If you want to see a really nice paper on group entropies and zeta functions, you can read this really nice paper by P.Tempesta:

3)Group entropies, correlation laws and zeta functions. http://arxiv.org/abs/1105.1935

C.Tsallis himself has a nice bibliography related to non-extensive entropies in his web page:

4) tsallis.cat.cbpf.br/TEMUCO.pdf

The “Khinchin axioms” of information/entropy functionals can be found, for instance, here:

5) Mathematical Foundations of Information Theory, A. Y. Khinchin. Dover. Pub.

Two questions to be answered by the current and future scientists:

A) What is the most general entropy (functional entropy) that can be build from microscopic degrees of freedom? Are they classical/quantum or is that distinction irrelevant for the ultimate substrate of reality?

B) Is every fundamental interaction related to some kind of entropy? How and why?

C) If entropy is “information loss” or “information” ( only a minus sign makes the difference), and Quantum Mechanics says that Quantum Mechanics is about information (the current and modern interpretation of QM is based on it), is there some hidden relationship between mass-energy and information and entropy? Could it be used to build Relativity and QM from a common framework? Therefore, are then QM and (General) Relativity emergent and likely the two sides of a most fundamental theory based on information only?


LOG#002. Information and noise.

digital_person

We live in the information era. Read more about this age here. Everything in your sorrounding and environtment is bound and related to some kind of “information processing”. Information can also be recorded and transmitted.  Therefore, being rude, information is something which is processed, stored and transmitted. Your computer is now processing information, while you read these words. You also record and save your favourite pages and files in your computer. There are many tools to store digital information: HDs, CDs, DVDs, USBs,…And you can transmit that information to your buddies by e-mail, old fashioned postcards and letters, MSN, phone,…You are even processing information with your brain and senses, whenever you are reading this text. Thus, the information idea is abstract and very general. The following diagram shows you how large and multidisciplinary information theory(IT) is:

IT

I  enjoyed as a teenager that old game in which you are told a message in your ear, and you transmit it to other human, this one to another and so on. Today, you can see it at big scale on Twitter. Hey! The message is generally very different to the original one! This simple example explains the other side of communication or information transmission: “noise”.  Although efficiency is also used. The storage or transmission of information is generally not completely efficient. You can loose information. Roughly speaking, every amount of information has some quantity of noise that depends upon how you transmit the information(you can include a noiseless transmission as a subtype of information process in which,  there is no lost information). Indeed, this is also why we age. Our DNA, which is continuously replicating itself thanks to the metabolism (possible ultimately thanksto the solar light), gets progressively corrupted by free radicals and  different “chemicals” that makes our cellular replication more and more inefficient. Don’t you remember it to something you do know from High-School? Yes! I am thinking about Thermodynamics. Indeed, the reason because Thermodynamics was a main topic during the 19th century till now, is simple: quantity of energy is constant but its quality is not. Then, we must be careful to build machines/engines that be energy-efficient for the available energy sources.

Before going into further details, you are likely wondering about what information is! It is a set of symbols, signs or objects with some well defined order. That is what information is. For instance, the word ORDER is giving you  information. A random permutation of those letters, like ORRDE or OERRD is generally meaningless. I said information was “something” but I didn’t go any further! Well, here is where Mathematics and Physics appear. Don’t run far away!  The beauty of Physics and Maths, or as I like to call them, Physmatics, is that concepts, intuitions and definitions, rigorously made, are well enough to satisfy your general requirements. Something IS a general object, or a set of objects with certain order. It can be certain DNA sequence coding how to produce certain substance (e.g.: a protein) our body needs. It can a simple or complex message hidden in a highly advanced cryptographic code. It is whatever you are recording on your DVD ( a new OS, a movie, your favourite music,…) or any other storage device. It can also be what your brain is learning how to do. That is  “something”, or really whatever. You can say it is something obscure and weird definition. Really it is! It can also be what electromagnetic waves transmit. Is it magic? Maybe! It has always seems magic to me how you can browse the internet thanks to your Wi-Fi network! Of course, it is not magic. It is Science. Digital or analogic information can be seen as large ordered strings of  1’s and 0’s, making “bits” of information. We will not discuss about bits in this log. Future logs will…

Now, we have to introduce the concepts through some general ideas we have mention and we know from High-School. Firstly, Thermodynamics. As everybody knows, and you have experiences about it, energy can not completely turned into useful “work”. There is a quality in energy. Heat is the most degradated form of energy. When you turn on your car and you burn fuel, you know that some of the energy is transformed into mechanical energy and a lof of energy is dissipated into heat to the atmosphere. I will not talk about the details about the different cycles engines can realize, but you can learn more about them in the references below. Simbollically, we can state that

\begin{pmatrix} AVAILABLE \\ENERGY\end{pmatrix}=\begin{pmatrix}TOTAL \;\;ENERGY \\SUPPLIED\end{pmatrix} - \begin{pmatrix}UNAVAILABLE \\ENERGY\end{pmatrix}

The great thing is that an analogue relation in information theory  does exist! The relation is:

\boxed{\mbox{INFORMATION} = \mbox{SIGNAL} - \mbox{NOISE}}

Therefore, there is some subtle analogy and likely some deeper idea with all this stuff. How do physicists play to this game? It is easy. They invent a “thermodynamic potential”! A thermodynamic potential is a gadget (mathematically a function) that relates a set of different thermodynamic variables. For all practical purposes, we will focus here with the so-called Gibbs “free-energy”. It allows to measure how useful a “chemical reaction” or “process” is. Moreover, it also gives a criterion of spontaneity for processes with constant pressure and temperature. But it is not important for the present discussion. Let’s define Gibbs free energy G as follows:

G= H - TS

where H is called enthalpy, T is the temperature and S is the entropy. You can identify these terms with the previous concepts. Can you see the similarity with the written letters in terms of energy and communication concepts? Information is something like “free energy” (do you like freedom?Sure! You will love free energy!). Thus, noise is related to entropy and temperature, to randomness, i.e., something that does not store “useful information”.

Internet is also a source of information and noise. There are lots of good readings but there are also spam. Spam is not really useful for you, isn’t it? Recalling our thermodynamic analogy, since the first law of thermodynamics says that the “quantity of energy” is constant and the second law says something like “the quality of energy, in general, decreases“, we have to be aware of information/energy processing. You find that there are signals and noise out there. This is also important, for instance, in High Energy Physics or particle Physics. You have to distinguish in a collision process what events are a “signal” from a generally big “background”.

We will learn more about information(or entropy) and noise in my next log entries. Hopefully, my blog and microblog will become signals and not noise in the whole web.

Where could you get more information? 😀 You have some good ideas and suggestions in the following references:

1) I found many years ago the analogy between Thermodynamics-Information in this cool book (easy to read for even for non-experts)

Applied Chaos Theory: A paradigm for complexity. Ali Bulent Cambel (Author)Publisher: Academic Press; 1st edition (November 19, 1992)

Unfortunately, in those times, as an undergraduate student, my teachers were not very interested in this subject. What a pity!

2) There are some good books on Thermodynamics, I love (and fortunately own) these jewels: 

Concepts in Thermal Physics, by Stephen Blundell, OUP. 2009.

A really self-contained book on Thermodynamics, Statistical Physics and topics not included in standard books. I really like it very much. It includes some issues related to the global warming and interesting Mathematics. I enjoy how it introduces polylogarithms in order to handle closed formulae for the Quantum Statistics.

Thermodynamcis and Statistical Mechanics. (Dover Books on Physics & Chemistry). Peter T. Landsberg

A really old-fashioned and weird book. But it has some insights to make you think about the foundations of Thermodynamics.

Thermodynamcis, Dover Pub. Enrico Fermi

This really tiny book is delicious. I learned a lot of fun stuff from it. Basic, concise and completely original, as Fermi himself. Are you afraid of him? Me too! E. Fermi was a really exceptional physicist and lecturer. Don’t loose the opportunity to read his lectures on Thermodynamcis.

Mere Thermodynamics. Don S. Lemons. Johns Hopkins University Press.

Other  great little book if you really need a crash course on Thermodynamics.

Introduction to Modern Statistical Physics: A Set of Lectures. Zaitsev, R.O. URSS publishings.

I have read and learned some extra stuff from URSS ed. books like this one. Russian books on Science are generally great and uncommon. And I enjoy some very great poorly known books written by generally unknow russian scientists. Of course, you have ever known about Landau and Lipshitz books but there are many other russian authors who deserve your attention.

3) Information Theory books. Classical information theory books for your curious minds are 

An Introduction to Information Theory: Symbols, Signals and Noise. Dover Pub. 2nd Revised ed. 1980.   John. R. Pierce.

A really nice and basic book about classical Information Theory.

An introduction to Information Theory. Dover Books on Mathematics. F.M.Reza. Basic book for beginners.

The Mathematical Theory of Communication. Claude E. Shannon and W.Weaver.Univ. of Illinois Press.

A classical book by one of the fathers of information and communication theory.

Mathematical Foundations of Information Theory. Dover Books on Mathematics. A.Y.Khinchin.

A “must read” if you are interested in the mathematical foundations of IT.