My notes for this page:

Lying with statistics: Can that really be done? And if so, how?

Slide 0

Hello and welcome to all of you. Today we will address whether and how you can lie with statistics. Perhaps the word “lie” is somewhat too harsh. The fact is, however, that it isn’t unusual to come across representations or interpretations of empirical data that may lead us to take a certain view. But let’s address the problem – as always – step by step.

Slide 1

And yes, statistics … that’s always the correct handling of statistics. Not entirely easy, because we can describe, emphasize, represent, and interpret things in many ways. This may sometimes be arbitrary, sometimes doesn’t get to the heart of the statement, and can sometimes even be viewed as lies.

Let’s examine some examples. We’ll look at three areas that exemplify the topic.

Slide 2

We will begin with absolute and relative numbers. The main point here is to include not only relative values in the meaningful evaluation of a situation, but also the absolute numbers.

Slide 3

The following example stems from an article by Gerd Gigerenzer and colleagues and goes back to a true incident. In 1995, a British authority issued a warning that a particular medicine would double the probability of blood clots in the lungs or legs.

That sounds alarming, no doubt.

Slide 4

What do you think? Shouldn’t we immediately take such a medicine off the market?

Slide 5

Let’s look closely at the related numbers:

Of every 7,000 persons, one person who had not taken the medicine developed a blood clot. Of every 7,000 persons who had taken the medicine, it was two.

Slide 6

Undoubtedly that’s an increase, but the relative numbers look more alarming. The relative difference is 100%, the absolute difference is just one person, of course out of every 7,000 persons.

Slide 7

Absolute and relative numbers can send different messages and we’ll look at this.

Let’s play with the numbers. To make it easy for us, let’s assume there are 1,000 people involved and there is an increase of 100%.

This results in the following increases:

1 person becomes 2 persons

10 persons become 20 persons

100 persons become 200 persons

If the absolute numbers are small, then a doubling is clearly not alarming. If they are large, that gives an entirely different impression.

Slide 8

Let’s play with the numbers again. Let’s assume 1,000 people again and this time consider an increase of 1 person.

This results in the following increases:

1 person     becomes   2 persons    and that is 100%

10 persons    become  11 persons   and that is 10%

100 persons   become 101 persons and that is just 1%

If the absolute numbers are small, then this increase seems very large and then effectively disproportionately large.

Slide 9

We can also handle average values in different ways. In particular, different average values such as median and arithmetic mean can lead to very different statements.

We also want to illustrate this with an example.

Slide 10

According to a report on the website of Capital magazine, in 2020 employees in Germany earned a median income of 43,200,- euros.

On the website de.statista.com, you find that – also in 2020 – employees in Germany earned 3,975,- euros per month on average. Full-time employment was presumed.

When you multiply this number by 12.5, this means an annual income of 12.5 times 3,975,- euros which is approximately 49,700,- euros.

Clearly, the two statements are not identical but rather differ by about 6,500 euros.

Slide 11

Where does this discrepancy come from? That’s very difficult to judge. For instance, it could be because both cases did not consider only full-time employment.

However, it’s plausible that the indicated value of 49,700,- euros is the arithmetic mean of the salaries – and not the median like for the 43,200,- euros.

Why?

Well, employees who earn a great deal drive the arithmetic mean higher, but not the median. In this case as well, an example will help you to understand this fact.

Slide 12

We’ll look around in Quitesmall City, a city with exactly 1,000 employed residents. It is unique in one way: The residents all earn the same income. Specifically, they earn €1,000 a year. That’s not opulent, but it makes the math much easier.

Now, Ms. Money Bags has decided to move to Quitesmall City. She has an annual salary of €1,000,000. And she brings about a shift in the arithmetic mean, thus in the average earnings determined using this method.

Since everyone earned the same amount, the arithmetic mean before was €1,000.

The arithmetic mean after she moves to the city is calculated from 1,000 times 1,000, thus the earnings of the 1,000 previous residents, and that totals one million euros. Add to that the one million that Ms. Money Bags earns. One million plus one million equals two million, which divided by the population of 1,001 results in approximately €1,998.

In contrast, the median is untouched; before and after it is €1,000.

Slide 13

The website gehalt.de takes both values into account for 2020, thus the arithmetic mean and the median. In addition, two other values are mentioned here, named Q1 and Q3, which are called quartiles.

Slide 14

It’s clear that “average value” here refers to the arithmetic mean.

The median is the value that sets the 50% mark. 50% of salaries lie below this value and 50% lie above it.

Q1 denotes the first quartile, the lower quarter so to speak: 25% of salaries lie below this value.

Q3 denotes the third quartile and thus the upper quarter: 75% of salaries lie below this value, thus 25% lie at or above this value.

For the sake of completeness: Statistics divides the data into three quartiles. The one missing here is called Q2 or the second quartile. It encompasses – not surprisingly – the middle 50% and this is – you surely have realized this – the median.

Slide 15

In closing, we want to deal with representations again.

We’ve already talked about this before: You can select representations so that you guide the statement of data in the desired direction (at least for not very attentive observers).

Do you remember the rice consumption in the various regions? Let’s look at this example again.

Slide 16

This is a bar chart for rice consumption per capita per year in kg. In Africa people eat just over 25 kg, in Latin America just over 29 kg, and in Asia and the Pacific region it is just under 85 kg per capita per year. You can very easily read the numbers here and recognize the proportions at a glance. Latin America is slightly ahead of Africa, while Asia and the Pacific region are ahead by about a factor of 3.

Slide 17

And you’re more or less familiar with this slide. If you were to mark the consumption value on both the x- and y-axes, you end up with a square. This turns the linear difference into a squared difference – at least it feels that way – meaning that the factor of 3 becomes a perceived factor of 9.

As I said, we had this already, the bar chart as well. Let’s derive new representations from this.

Slide 18

Here is another bar chart. The numbers haven’t changed, as you can read on the y-axis. What impression does this make? It looks like even significantly more rice is consumed in Asia and the Pacific region than in Africa or Latin America. The cause is simple: The labeling on the y-axis starts at 20 and not at 0. The bars no longer correctly express the proportions.

Europe with a rice consumption of 4.6 kg per capita per year and North America with 12.5 kg would completely disappear here and therefore I didn’t even include them on the first chart.

Slide 19

It gets really scary if you mix the two representations. You see that the resulting impression has little to do with the facts. Nevertheless, you could always read the correct numbers on the y-axis. So, the representation did not lie, but for sure is manipulated. And believe you me, you find representations like this not infrequently in the media.

Slide 20

A brief summary: In today’s world, statistics have become the basis of numerous decisions.

However, statistics are often misunderstood or even intentionally represented deceptively. There’s only one option: You must critically question data, their meaningfulness, and limitations as well as their representation (and no, not mistrustfully in general).

Slide 21

Thank you for your attention. This was the 20th episode of our conversations about mathematics and at the same time the last episode at the advanced level. I hope I will see you again in the next episode. Then we’ll be discussing statistics and probability for pros.

Tip: Log in and save your completion progress

When you log in, your completion progress is automatically saved and later you can continue the training where you stopped. You also have access to the note function.

More information on the advantages