My notes for this page:

Chatting about stochastics: A couple of terms

Slide 0

Let’s talk about math again. Welcome to this new episode. Today we will address a couple of basic terms in statistics and see how you can integrate them in activity-based mathematics teaching. We will deal with simple data from an easy-to-conduct survey.

Let’s begin.

Slide 1

Imagine that you are designing a survey at school. And naturally, one that a class can conduct and evaluate largely on its own. The goal of the survey is the answer to this question initially asked very generally:

Are the students interested in the subject of mathematics? The result should be compared with the subject of English.

The method is technically suitable: A questionnaire is drafted that can then be properly evaluated.

Slide 2

What belongs on the questionnaire? Well, for example you could ask about grade level, age, gender, interest in these and other subjects, motivation to devote one’s free time to the respective subject areas, favorite subject, or scores (e.g., on the last exam).

For every single question, it is imperative to clarify two essential aspects in advance.

First, you must be certain about why you are asking a particular question. This aspect is not just preparatory instruction for hypothesis-driven work, but it also provides the opportunity to address handling data with respect to data privacy.

Second, you must consider the how. Do you ask for the age in years or in years and months? What degree of accuracy can you sensibly use? Will interest in a subject be depicted on a four- or five-point scale, for example? Naturally, the how is closely related to the why.

Slide 3

The data that are gathered here are quite varied. Data such as gender or favorite subject are referred to as qualitative data. You code using qualitative characteristics. Stated very simply – and you may do that in class in many situations – for the most part, these are data that you cannot sensibly do calculations with.

Slide 4

Data such as grade level, age, or score on the last exam are quantitative data that are coded with numbers. Accordingly, you can arrange them in an order and for example determine the average age of the sample or calculate the average scores on exams.

But be careful: these values must be sensibly interpreted in the context of a survey. A better score on an exam is considered desirable, a higher or lower age is not necessarily a quality characteristic and above all cannot be evaluated outside of a specific context.

Slide 5

Finally, what do we do with the interest in the subject or the motivation to devote one’s free time to the respective subject areas? These are also qualitative data if you ascertain a qualitative strength such as from very high to very low.

Slide 6

However, another distinction is important in this particular area, namely the distinction between ordinal and nominal data.

Age, grade level, and scores are examples of ordinal data because you can arrange them in a meaningful order. Gender and favorite subject are examples of nominal data because there is no meaningful order here.

However, you can also rank data such as interest and motivation. You do that by coding them with numbers, for instance, between 1 and 5 depending on the level of the strength.

In this process, though, keep in mind it is more difficult to distinguish high motivation from average motivation than to distinguish an exam score of 2 from an exam score of 3. It is important to know that the distances between each of the strengths are not necessarily equal.

Nevertheless, calculations are performed with such numbers and average values are determined, and that’s completely legitimate. It is especially important to consider the weaknesses of coding when interpreting the data.

Slide 7

Let’s get to the terms. We assume that all preparations are complete and the survey has been conducted.

In our case, 882 students in grades 5 to 10 at the Marie Curie School have filled out a suitable questionnaire. The results confirm the guess: There are more students whose favorite subject is English than those who name mathematics. The other subjects as a whole end up in third place, perhaps somewhat surprisingly.

In absolute numbers, 441 students name English as their favorite subject, for 312 it is mathematics, and for 129 another subject. in this way, you easily arrive at an important basic term in statistics, namely the absolute frequency. In connection with the question of the favorite subject and the answer of “mathematics,” the absolute frequency is 312.

Slide 8

Now, 312 of 882 is certainly somewhat different from 312 of 1,000,000. What can we do with these absolute values? Well, it’s useful to relate them to the population. Therefore, we calculate the relative frequency as a quotient of the absolute frequency and the size of the population.

And then logically we reasonably consider a good way to represent the results, which in this instance can be a pie chart.

The data have not changed, but in this way, they are easier to evaluate. Exactly half of the students named English as their favorite subject, and a good third selected mathematics. And you can see these values immediately on the pie chart.

Slide 9

In another question, the students’ interest in the subjects of English and mathematics was ascertained. In this case, we used a tool called the Likert scale, which ranges from 5 (“very high interest”) to 1 (“absolutely no interest”). Here as well, you first see the absolute numbers, the raw data list – and we already used this term in an earlier episode.

Slide 10

But here too, the meaningfulness of the absolute frequencies is limited. Therefore, again we search for a suitable representation that’s meaningful at a glance. For instance, that could be a bar chart.

You can nicely enter the values for both subjects next to each other and easily see – at least qualitatively – the differences in this way. English has peak values for high and average interest, as does mathematics for average interest, but the bar for high interest is clearly lower.

Slide 11

With all due caution – we already mentioned this earlier – we determine an average strength and calculate the arithmetic mean. To do so, we simply weight the individual strengths, thus the individual values for the interest, and divide by the size of the sample, thus the number of all surveyed students.

A total of 149 students selected very high interest in mathematics, so this value is entered into the calculation as 5 • 149. Altogether, we arrive at (5 • 149 + 4 • 206 + 3 • 256 + 2 • 174 + 1 • 97) : 882 and that’s 3.15 rounded.

Slide 12

For the subject of English, the value is 3.48, thus – as was expected from a first glance at the data – higher. Too bad, because the expectations that we had before collecting the data were also confirmed.

Of course, we could ask ourselves whether the difference between the two average values is really meaningful – or stated with the technical term – significant.

We’ll put the answer off to a later episode. That goes into “probability and statistics for professionals” and isn’t part of lesson content until grades 9 to 12 anyway.

Slide 13

But let’s see what else might be interesting on a rather simple level.

We look at the data set again and notice that more than half of the students have at least high interest in the subject of English. We have

198 + 263 = 461 and that is greater than 882 : 2 = 441.

By the way, the relation “greater than or equal to” would be sufficient for the following considerations and therefore you see that here in the written version.

For mathematics, things look different, but all the same, more than half of the students have at least average interest. We have

149 + 206  + 256 = 611, which is greater than 882 : 2 = 441

but

149 + 206  = 355 and that is less than 882 : 2 = 441

Slide 14

As a result, we arrive a new term:

We look at a data set and the value that – stated casually – lies in the middle, so that exactly half of the data come before and after this value. This value is called the median.

And what if there isn’t such a “middle” because we have an even number of values? No problem: then the median is the arithmetic mean of the two middle values.

In the example, the median for interest in mathematics = 3 and the median for interest in English = 4. Here as well, the subject of English is ahead, unfortiunately.

Slide 15

Isn’t it superfluous to introduce another average value if it always results in the same thing anyway? Sure, if it were so. But it doesn’t have to be so, and additional data from the survey should show this.

This time we look at the scores on the last exam for 27 students in grade 8a and 29 students in grade 8b. Here you see the absolute numbers of the individual scores.

Scores are something special. They are used differently everywhere. Sometimes they go from 0 to 10, sometimes from 1 to 6, sometimes the small number is the best score, sometimes the large number, sometimes even letters are used. In the example, we consider scores from 0 to 5, where 5 is the best score.

Slide 16

If we calculate the arithmetic mean and median, the arithmetic mean is the same in both classes at m = 3.3. However, the median differs by one score and is 4 in grade 8a and 3 in grade 8b.

Slide 17

Let’s look at this again on a bar chart. Quite clearly, the two classes have a very different frequency distribution. In grade 8a, barely any students have the rather bad scores of 0 and 1, a score of 2 was assigned quite often, and the better scores of 5, 4, and 3 can be seen, but a lower level.

In grade 8b, there is a peak for scores 4 and 3; otherwise, the entire score spectrum was used. This leads to the difference in the median despite an identical arithmetic mean.

Slide 18

Obviously, average values are thus not always sufficiently meaningful; rather, it also depends on how scattered the data are. Accordingly, we are interested in how much values deviate from an average value.

Let’s look at such a “measure of deviation“.

Slide 19

The arithmetic mean in both classes was 2.7. Let’s look at how much the individual measured values deviate from this arithmetic mean.

To do so, we simply find the differences. 5 – 2.7 = 2.3, 4 – 2.7 = 1.3, 3 – 2.7 = 0.3, etc.

Slide 20

And now we weight these differences. For grade 8a we calculate

2 • 2,3 + 4 • 1,3 + 6 • 0,3 + 14 • 0,7 + 1 • 1,7 + 0 • 2,7 = 23,1

Slide 21

And for grade 8b

2 • 2,3 + 8 • 1,3 + 8 • 0,3 + 5 • 0,7 + 3 • 1,7 + 3 • 2,7 = 34,1

Slide 22

We can calculate an average deviation from this if we divide by the number of students.

These simple calculations already provide a sense of the differences. In grade 8a, the measured scores deviate on average less strongly from the arithmetic mean than in grade 8b.

Slide 23

So much for the principle; in practice, we do this somewhat differently. Let’s assume that we start with n numbers. We calculate the difference between an individual value and the arithmetic mean and first square this number. The reason is simple, because in this way we end up with a positive number from a positive or a negative number and we don’t hassle with the sign. We do this for each measured value and calculate the sum across all of these n numbers.

Finally, you divide, although normally not by n, but by n-1. This number is called empirical variance. And if you take the square root of this, you arrive at the standard deviation.

In the example, we take 2,32 = 5,29; 1,32 = 1,69; 0,32 = 0,09; 0,72 = 0,49; 1,72 = 2,89;   2,72 = 7,29 as a starting point.

Slide 24

We now weight the differences with these squared elements, first for grade 8a. Two students indicated a 5 as the exam score, thus 5.29 with the factor 2 is entered into this sum. Four students indicated a 4, thus 1.69 with the factor 4 is entered into this sum. And so it continues until the score of 0, which nobody had on the last exam, and so 7.29 is multiplied by 0.

Added together, this results in 27.63, divided by 26 results in 1.063, and the square root of this is 1.03. Therefore, the standard deviation sigma = 1.03.

It goes the same for grade 8b and we end up with sigma = 1.44.

Slide 25

Quite clearly, the measured values in grade 8b are far more scattered than in grade 8a. We can also establish this qualitatively.

We now have a reliable measure for this scattering with the standard deviation sigma.

Tip: Calculate such an example yourself. This will make it much clearer why this procedure makes sense. In particular, you see why large differences compared to the average value have a great impact and smaller differences are more easily ignored.

And yes, for sure we could do it differently. This procedure is not mandatory; there are certainly other measures for scattering. These measures would then handle differences in another way, thus ignore differently or emphasize differently. This latitude undoubtedly is one of the problems that some students have with statistics and probability.

Slide 26

That’s all for today. Perhaps this wasn’t a very easy episode. Many thanks for being here. I look forward to seeing you next time.

Tip: Log in and save your completion progress

When you log in, your completion progress is automatically saved and later you can continue the training where you stopped. You also have access to the note function.

More information on the advantages