Oct 26 2012

Thermo for Normals (part 25): The odds of being (energy) rich

In a gas such as the air you're breathing, not all of the molecules are flying around with the same speed. If they were, that would be pretty amazing. Nevertheless, most of the molecules are flying around at about the same speed. And the speed is related to the energy. We'd like to know what the average speed is, how many are going faster than that, and how many are going slower. To do that we first have to determine how much energy each of them has.

We already discussed that, for a gas at least, the average energy depends on the temperature. We obviously expect that the answer to both of the questions we just asked depends somehow on that.

Here's the answer.

Warning!

The probability that a molecule in a substance at temperture is in a state of energy is proportional to .

That's it. And yes, that is the same as in the ideal gas law, . is called Boltzmann's constant.

By the way, instead of using Joules, now that we've changed over to talking about the very small things, the unit of choice for energy is the "electron volt". 1 electron volt (abbreviated eV) is equal to a very tiny joules. But since the particles we're talking about tend to have even smaller energies than that, this makes it convenient. In fact, for a particle to have even 1 eV of energy would be very unusual. Anyhow, you can look up and put in K (which is room temperature) to see what is for room temperature. You'll find that is about 1/39 eV, or about 0.025 eV, which is a handy number to know.

We can't immediately say what the probability of having energy is. That's because for now all we know is a proportionality, and not an equality. But we can immediately say how much more probable having 1 eV of energy is than 2 eV of energy at temperature . It's


At room temperature is about 1/39 eV, thus having 1 eV is about more likely than having 2 eV. Wow, that's a lot more likely! Having significantly more energy is very rare in a gas at room temperature. We would think, then, that the average energy of a particle in a gas at room temperature is probably much lower than that.

As always, to make a proportionality into an equation, we must find a constant. That is, the true probability of a molecule having energy is


How can we find ? Well, the probability of being at any energy has to be 1. The sum of all probabilities is 1. Now, in principle, a particle can have any energy, from 0 to infinity, and all numbers in between. But for the moment, let's just simplify things and assume that energy comes only in packets. This is for a technical reason that I'd rather not discuss right now. We'll let each packet be 0.001 eV. (This isn't as unrealistic as you might think. In the end, energy really does come in packets.) So the particles can have 0 eV, or 0.001 eV, or 0.002 eV, etc., but they can't have 0.0005 eV.

The sum of all probabilities can now be done pretty easily. I'll calculate it here, but feel free to skip.

Important!

We have to sum all the Boltzmann factors, , from to infinity in steps of 0.001 eV. We can write


where is the energy of a packet, 0.001 eV. Believe it or not, this sum has a closed form, and very simple solution. It relies on the fact that this sum is of the form


which we'll call . The only difference is that goes to infinity at the end of the calculation. This sum is only less than infinity if is less than 1, but for , of course, it always is. Now, note that


Therefore


and as goes to infinity, is clearly zero. Thus the sum is just . How simple is that?!

Now, the sum we want is


and so


meaning that


and



Let's take a look at what this means. The most probable energy, surprisingly, is zero. However, this does not mean having zero energy is probable! It only means it's more probable than having other energies. At 300 K, it's about a 3.8% chance. The probability of having 0.001 eV is 3.7%. But the probability gets really low really fast. Having just 0.01 eV is 2.6% and 0.05 eV is just 0.5%.
Here, I'll plot it for you:
But look what happens as the temperature goes up. For something at 600 K, the probability of having no energy decreases a lot, down to 2%, and the probability of having higher energy goes up. Whereas the probability of having 0.02 eV at 300 K was basically zero, at 600 K it's not. You have a 1 in 1000 chance of having that much energy if the gas you're in is at 600 K.

The analogous relationship is to what some citizen's likelihood of having a certain amount of income is. The first thing that matters is how rich the country you're in is. If the whole economy has a lot of wealth, then you stand a good chance of having more. But still, within any society, there are rich people and poor people. And there are always far more poor people than rich. So, here's the annual personal income in the US for 1994 and 2006:
This looks very much like the distribution for energy. In 1994 there were only roughly 7 trillion dollars to go around, so the probability of making only $2,500 in a year was decent. By 2006, the amount of dollars nearly doubled, even though the population only went up by about 10%. So the odds of only making only $2,500 annually dropped, and the odds of making $50,000 improved. 2006 is kind of at a higher "money temperature" than 1994 was, simply because of the combined wealth of the country (the total number of dollars that existed).(For more info as to why the US is Boltzmann distributed, see this page).

Now, the thing we usually care about are things like what the percentage of people are poor (which is like asking how many particles have energy between 0 and 0.01 eV), or how many are rich (like how many particles have energy between 0.01 and 0.02 eV). That is, we want to know what proportion of particles there is within a range of energies. Fortunately, probabilities add. So if we want to know how many are going to have between 0.01 and 0.02 eV, you just have to take all the probabilities between those two numbers and add them together. And the answer is that at 300 K, 23% of the particles have energy between 0.01 and 0.02 eV.

We also might care about some statistics. In the economy we might want the average income. In physics we might want to know the average energy. Well, we can do that pretty easily too. If 5 people make $20k and 10 people make $30k, then the average income is just thousand. We didn't just average 20 and 30; we weighted each value by how many people made that much. Another way to say this is that the average is the sum of the percentage of people making that much times the amount they made, . So, the average energy of a particle in a system with temperature is the sum of for each from 0 to infinity.

Simple, right?

Of course, actually doing the sum is a bit challenging.

Important!





where . Then, we have


The latter sum was already done in the last calculation. It's $1/(1-e^{-\beta E_p})$. So,




At room temperature, the average energy is 0.0251 eV. That's actually very close to , which is 0.0256 eV. So if you want to know what the average energy is, all you have to do is calculate . (The error is from the fact that we divided energy into packets that were too big.)

Now I'm going to calculate the true average energy, and it won't include anything about the packets. However, the mathematics is more complicated, so all but the most technical normals should not think about reading this.

Important!

The reason I avoided allowing all energies was that the thing we get from doing the analysis isn't a probability, but a probability distribution. Probability distributions aren't probabilities! For one thing, they have dimension. For another, they only tell you what the probability is in a range from to . But for calculating the average, it's sort of cleaner to do it this way. Anyway, the probability is still proportional to . The constant of proportionality is, then,




Ergo,


Notice that this actually has dimensions of 1/energy. That's why I didn't want to talk about it.

Calculating the average is a simple matter


Do the same trick with the derivative:





"Wait just a damn minute", you might say. "Are you trying to tell me that if I have a gas that's at an extremely high temperature, and I pick an atom at random and check its energy, the most likely outcome is that it has no energy?! It was standing still?!" After all, the biggest probability is clearly when . The answer, of course, is "no!" There is more to it than this, and we'll pick up here next time. But keep in mind that what we're saying is that the probability of having energy at temperature goes like .

Oct 25 2012

The insane and inane in the Third Party debate

If you ever needed an argument for why the two party system in the US is a good idea, just watch this year's Third Party debate, which took place this past Tuesday. It included the "candidates" for President from the Green, Justice, Constitutional, and Libertarian parties (Jill Stein, Rocky Anderson, Virgil Goode, and Gary Johnson, respectively).

Getting right to the heart of the matter, Rocky Anderson on the very first question states that South Africa (!) was a model of democracy, since their very first election for president had 18 candidates. This wonderfully enlightened state of affairs would in principle allow a candidate to win with 6% of the vote, which although paltry figure, would be a miracle for any of these people to garner.

But this must be the position of each of the third party candidates: if you're a major party loser, all you can do is complain about the major parties. And that's all these people are: losers and aspiring election spoilers.

Indeed, the most prominent of these candidates is Gary Johnson, erstwhile candidate for president in the Republican party, former governor of New Mexico, who went nowhere despite inclusion in the GOP primary televised debates. He wasn't sufficiently different from Ron Paul, and Ron Paul was himself not a major factor. But Paul, to his credit, didn't hold a 1 year pity party in which he toured the country pretending he was affecting an election.

Not being sufficiently different from their major party counterparts is one big part of the problem. Of the four, two were mainly Democrats and two of which were mainly Republicans. Stein and Anderson take predominantly liberal positions about climate change, money in politics, wall street, and defense. Goode and Johnson take predominantly conservative positions about budget deficits, social programs, and federalism.

And then there's the point where each goes off the rails. The necessity of a primary election to choose the most electable candidate from a group then comes into relief.

First, all these candidates want to essentially end military spending, and the United States' military role in the world. This is popular in the extremes of both major parties. Extreme liberals want to give peace a chance (to the point where even preventing genocide is not a good enough reason) and extreme conservatives are isolationists who don't give a shit what happens in the world (and don't wanna pay for it). This position is, to be mild about it, not acceptable to the country or to any right-thinking person. First of all, it would by itself collapse the economy into a massive depression by abrupt huge cuts to the economy. It fails to realize any reality about destabilizing impacts of the nuclear arms race in developing nations, the need to prevent human rights atrocities, or the need to combat piracy. No person espousing this idea could possibly be the nominee in either major party, for the very good reason that it's insane.

All of the candidates also agree that the actions taken by the government to combat terror must be immediately rescinded (FISA, Patriot Act, etc.) if they skirt  normal laws of jurisprudence, and they also want to end all drone usage. But the American people are smart enough to know that when our President (a constitutional lawyer) acts outside of this, it is not a path toward totalitarianism. Not when Bush did it, not when Obama did it, and not when the next president does it. Prosecuting the law is messy, and it's silly to indicate otherwise. Ending all drone usage, a technology that has decimated the murderers in Al Qaeda at virtually no American life lost. would be stupid.

Each pair of candidates goes to the extreme of their own party's tendencies. Stein wants free college for everyone and full public financing of all elections. Anderson is angry that 30 million people will still be uncovered under the Affordable Care Act (aka Obamacare) and wants all troops out of Afghanistan immediately without regard to the condition of that nation. Meanwhile, Goode and Johnson want to cut 1.3 trillion dollars from the federal budget in the first year, an act that would destroy the US economy and strike a global calamity. Goode wants to stop all immigration until the unemployment rate falls below 5%. Johnson wants a "fair" consumption tax, rather than a progressive income tax, which would pummel the poor in a way even Mitt Romney wouldn't possibly consider.

The candidates all agree that we should legalize drugs. Ok, fine, point to them.

But they also all agree that the two major parties don't have any real differences between them. And it's true, in a certain way, that two people with meaningful differences operating within the confines of reality look really similar compared to raving lunatics.

The video is embedded below. I don't recommend it.

Oct 17 2012

A terrific debate with little dishonesty

Clearly the president had a much better night than he did during the first debate, and he also had a much better night than did Mitt Romney. But more, and more importantly, the debate was just plain good. It was wide-ranging, featured legitimate differences in opinion, and not a lot of lying on either side (there was, of course, some). George Will, who has seen every presidential debate ever televised (so basically all of them in history, since such debates did not happen prior to 1960, not even on radio) opined that this was the best of all of them.

The first debate had no breadth. The issues were the Middle East, the Middle East, taxes, the Middle East, taxes, health care, and deficits. One of the debaters was disengaged, the moderator asked silly questions and even sillier follow-ups. China and Europe did not come up at all, nor did sequestration (half of the so-called fiscal cliff) to any real degree.

Tonight we talked about employment, energy, guns, education, contraception, health care, taxes, equal pay, immigration, Libya, China, and manufacturing. The moderator didn't have to goad the debaters to state their differences---they did so willingly and often.

Since everyone seems to be going with boxing analogies (probably my favorite sport that happens outside the olympics), I may as well do the same. Obama's corner had a better strategy, but as a fighter he was also just more nimble.

His answer regarding gas prices was stinging, but only because Romney left an opening. Romney framed the discussion as if $1.80 gas was a good thing, when in fact the collapse of the oil market was precipitated by a huge catastrophic loss of aggregate demand. The president argued, both persuasively and correctly, that the gas price surge is concomitant with a recovering economy (indeed, gas was $4 before the crash), but also put in a good jab linking Romney to the same Bush policies that arguably caused that crash. If you're going to take a cheap shot, make sure you don't leave yourself open to such an obvious counterpunch.

Shortly after that, though, is where things started to really look bad for Romney. Obama has a weak spot, but somehow in going for it, Romney ended up on the mat. The president is vulnerable on the death of our ambassador in Benghazi, as his press secretary and other surrogates went around for weeks saying inaccurately that the attack was part of the protests about a Youtube video. It ought to have been perfectly obvious to Romney's coaches how to approach this issue: solemnly and with gravity, questioning competence but not intentions.

Instead, stupidly, he said the president was partying in Vegas while the ambassador's wife wept. But the president took responsibility right off (counteracting the baffling news from the day that Hilary Clinton was saying she was to blame), added that it was he who was at the airport when the coffins came in, and that he had called the attacks terrorism right off the bat. Romney then decided not to back off, but to swing wildly: staring at the president, he said he had done no such thing. Candy Crowley, who was quite familiar with the Rose Garden address that day, knew Obama had said "these acts of terror", and corrected Romney. That, plus the president taking legitimate umbrage at the "offensive" implication that the administration was playing politics, was enough to put Romney off his feet. It wasn't a knockout, but I'd say at least a 3 count.

Romney seemed ill at ease with the question regarding immigration. Given that his campaign seemed to be going for moderation, I thought that in that answer he would retract the abominable suggestions that we should starve immigrants until they leave (he calls it "self-deportation"), but he didn't. He gave a boilerplate Republican response. He didn't lose points, but he didn't gain them either.

Obama kept Romney from scoring several times. The president explained clearly and concisely why drilling leases on public land were down (because the oil companies weren't using them) and even recalled how Romney himself said that coal plants are dangerous (they are!). He also swatted down Romney's attempt to say that the rich will be taxed the same under his plan, which is bullshit.

Finally, the 47% came up. I'm a little disappointed that Romney didn't have a chance to respond. But the president highlighted how really horrible it was by several sympathetic examples of this demographic, and it was a good shot.

The president is still not effective at countering the claim that 0 net jobs have been gained in his administration (due to his coming in while job losses were humongous). You'd think he could do that by now. He also failed to mentioned the Tax Policy Center's grading of Romney's aforementioned bullshit plan.

Lies I noticed:

  • Romney did not endorse the Arizona "papers, please" law.
  • Obama did not have a supermajority in congress for very long (it's complicated)
  • Romney's claim of supporting Detroit's bankruptcy "which is what happened" is just not true. The government put $80B of TARP money into the companies to keep them liquid during Chapter 11, far different than Romney's plan.

That's not very many! Romney didn't repeat the lie that Obama is raiding Medicare, nor his lie about welfare-to-work being gutted, and no lies that his plan will cover pre-existing conditions. Obama didn't double-count savings for Medicare fee-for-service.

Oct 15 2012

Understanding polls (part 2)

In a previous installment, I showed the results of sampling a population to determine some fact: In that case, it was a hypothetical measurement of the number supporting Obama. By taking 50 "people" at random in the set and tabulating their responses, one gets a rough estimate of the actual proportion of the population who believe something. But how rough? If you run the poll many many times and then tabulate all the results you get, the picture looks something like this:

Sampling distribution polling 50 people, and running 1000 "polls".

The mean of all these poll results was equal to the actual population mean, which is the peak of this chart, at 53.9% in this hypothetical example. So if you were to take a lot of polls and average the result, you'd be damn close to the real value. But if you only take one poll, your result could be as low as 40 and as high as 70. In other words, if you take a poll of only 50 people, and you only do it once, you aren't at all confident of your result.

The chart above shows something which is obtusely called the "sampling distribution of the sample mean". It is a histogram of the results of many samples (polls), which calculate the mean of something (in this case, the mean proportion who want Obama to be president). The sampling distribution of the sample mean itself has a mean! But fortunately that mean is the same as the population distribution, which is what we really care about. However, not all aspects of the sampling distribution are the same as the population distribution.

Characterizing data

If I have a long list of data points, what are the sensible ways to describe this data? The most obvious is the mean. Suppose we poll 10 people on who they want to vote for and get the following results:

# Obama Romney
1 X
2 X
3 X
4 X
5 X
6 X
7 X
8 X
9 X
10 X

We could let Obama = 1 and Romney = 0. In that case, we would calculate a mean that would represent the proportion who want Obama to be president. The mean, , is

But there are also other times when you also want to know about the spread of the data. For instance, here is the distribution of percent change in US gross domestic product:

The mean looks to be something about +1 (it's actually +0.78), but we might want to also know how often and how much it varies from this. The natural thing would be to add up the differences of each data point from the mean, but if we did that then we'd end up with something close to zero (the mean by definition has roughly as many points above it as below it). Instead, we square the difference from the mean, then take the square root at the end. The standard deviation, , of samples is then

where refers to the set of numbers in your data and means "add them all up". For the GDP growth, works out to 0.99, meaning that most of the results fall between the mean and 0.99 above and below it, which is clear from the graph itself. Another way of saying it is that the distribution has width that is roughly 2 (plus and minus 1), with just a few points lying outside of this range.

We can calculate the standard deviation for our example poll of 10 people above:

so taking the square root gives Since the choice between Obama and Romney is just binary, this number has no obvious interpretation as a width of anything, but you can calculate it nonetheless.

Width of the sampling distribution

Let's go back to our original example, a poll of 50 people run 1000 times, which gave us the sampling distribution of the sample mean. The standard deviation for the population is not the same as the width of the sampling distribution! for the population is about 49.8% (the way of calculating this is simple, but would require a slightly longer discussion to elucidate) but the sampling distribution is much narrower, at just about 7%.

The relationship of the population width and the sampling distribution width is given by the Central Limit Theorem. I'm not going to go into the details of this, but I will state what it says about this case. Suppose our population standard deviation is called and the standard deviation of our sampling distribution is . Then, for sample size ,

If you want to know the width of the sampling distribution, you take the actual population standard deviation and divide it by the square root of the number of people you sampled. In this case the result is 7 points, so that most of the "polls" show up within 7 points of 53.9% (7 in either direction).

This means that we can be much more confident if we take a larger sample size in our poll. Instead of polling 50 people, suppose we poll 200 people or 500 people, and run the poll 1000 times just like previously. Here are the 3 different distributions, which you can see getting narrower and taller as we increase the sample size:

By the standard deviation of the sampling distribution is only 2.2. Meanwhile, the Central Limit Theorem predicts

in quite good agreement with the simulation.

What "most" means

What does it mean that if you take a poll of 500 people that the poll result will "usually" be within 2.2 percentage points from the true value? The Central Limit Theorem, in addition to telling us the standard deviation of the sampling distribution, also tells us that the sampling distribution is normal. And for a normal distribution, 63% of all values fall within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99% within 3 standard deviations. If I take a poll of 500 people and get a result that, say, 56% favor Obama, there is only a 1% chance that my result is more than 3 standard deviations from the true value, or within 6.6 of 56. So, we could say "there is a 99% probability that Obama gets between 49.9% and 62.6% of the vote".

A more realistic poll would include something like 2200 people (that's the number Gallup uses in its national tracking poll). Since their would only be 1 point, they could say, instead, "there is a 99% probability that Obama gets between 53% and 59% of the vote". That's still a 6 point spread, but it's enough to say with confidence that the president will win.

Now, there's a slight issue with this that the reader may have noticed. In order to calculate the probability, one has to already know what the population standard distribution is! Since the only way to do that is to ask every person in the population who they're voting for, this value is always unknown. Instead, statisticians approximate the population standard deviation by the standard deviation that they measure in their poll. If the poll comes up 56 for Obama, the measured standard deviation is 49.6, slightly below the actual population standard deviation of 49.8. Because of the slight error in this, normally we do not say "probability" and switch to saying "confidence". So, we are "99% confident that Obama gets between 53% and 59% of the vote." This range is the confidence interval.

All of this uncertainty in the poll is due to sampling, and it's inherent. The assumption is that you've taken a truly random sample, and in that case you have this much error just from mathematics of randomness. The best designed poll will still have this. However, by averaging a lot of polls together, such as FiveThirtyEight or RealClearPolitics does, you help to reduce this uncertainty even further by increasing .

Oct 12 2012

Biden bucks up the troops

Last week's Presidential candidate debate was an unequivocal rout for Barack Obama. The aftermath of tonight's Vice Presidential candidate debate makes that clear: partisans from both sides of the spectrum are claiming that they won. Last week, only one side was saying that.

Obama's losses in the past week have been steady and disheartening, erasing all of the gains made by the Democratic convention, the 47% remarks, and the stupid declarations of Romney regarding the Egyptian embassy on 9/11.

Fivethirtyeight's electoral vote projection taken today.

The Obama campaign has hemorrhaged 31.3 electoral votes since last week, when Romney decisively bested the president. In a majority of the national tracking polls, Romney is currently either tied or leading (though not in swing state polls).

But tonight Biden probably turned that around. No sign is clearer than what the criticisms of the other side were. After the first debate, the impotent rage of Democrats, when not directed at Obama himself, was at Romney's demeanor, his interrupting of moderator Jim Lehrer, his hatred of Big Bird. That is, things that were procedural but not pertinent---insubstantial. This time? Republicans whine that Biden kept interrupting Ryan, that he was laughing and mugging for the camera, and that the moderator (an excellent Martha Raddatz) was biased. Playing the ref never looks good, no matter who does it.

What does winning mean in this case? Conventional wisdom, which I do not disagree with, is that Biden needed a good and tenacious showing to prove to supporters that Obama "gets it", that he knows he needs to get his shit together and prepare properly for debates. The onus is still on him to pull this out in the final stretch, but it appears that the campaign knows that the winning arguments are going to be about the 47%, a vigorous debunking of the claim that Obama raided Medicare, and a full-throated announcement that Romney has no details because he has no positions.

Other thoughts:

  • A trending search-word during the debate on Google: conflating. Voters don't know that word?
  • I really hope this ends the discussion of the politics of Big Bird until at least the next midterm elections.
  • Martha Raddatz was outstanding, but maybe only in contrast. I'm not sure she's any better than a moderator ought to be.
  • Ryan started talking about contraception during a very specific answer that was supposed to be about abortion. I don't see these as being linked, but what do I know?
  • The photos released today of Paul Ryan working out were fucking weird.

Older posts «

» Newer posts