Data science in society: introduction

Counting is a primordial tool. Depending on the context, the numbers 1 or 2 have different impact than 11 or 12. Consider the situation that involves hunger or danger. The process of counting involve recollection of information from the surroundings through sensors. For humans, the obvious sensor is the eye. Needless to say, we have…

Counting is a primordial tool. Depending on the context, the numbers 1 or 2 have different impact than 11 or 12. Consider the situation that involves hunger or danger.

The process of counting involve recollection of information from the surroundings through sensors. For humans, the obvious sensor is the eye. Needless to say, we have more sensors as well as other species.

The tool is so powerful that it can be used to understand patterns of the surrounding, i.e. nature. Even though we are nature, it is reasonable to place ourselves in a special position in order to study natural phenomena. Of course, at the end we must put ourselves back in the picture. For some phenomena, we humans, play no role in the conclusions of the study but in others we do. It all depends on the circumstances.

The act of counting has developed so much that we have created a language: mathematics. This language does more than count, it has successfully provided humans a framework in which we can study nature in a quantitative manner.

That is why physics is written in the language of mathematics. But there is a difference. Mathematics do not need the full input of physics and physics do not need the full input of mathematics. Physics always needs nature as input. This difference has being fruitful. Sometimes mathematics moves faster than physics and vice versa.

What makes physics unique is that it describes natural phenomena using mathematics. So far, there is no other reliable tool to achieve this goal.

The current understanding of physical phenomena1 is summarized in the following equations:

There are many symbols in both equations and if you are not familiar with them, do not worry. I just want to let you know that they exist.

However, try to look at them. If you do, the symbol S[X] will appear in both equations. The symbol S is known as the action and from this mathematical object the classical dynamics is derived. On the other hand, there is also Z. The symbol Z is known as the partition function and from this other mathematical object the quantum dynamics is obtained.

Notice that Z depends on S.

What is classical and what is quantum? At first approximation we can interpret classical dynamics as the description of physical phenomena at large scales and quantum dynamics as the description at tiny scales.

The fundamental difference between classical and quantum dynamics is that the later rely on probabilities. The outcomes of an experiment at small scales have an intrinsic uncertainty. Let us call this the quantum uncertainty.

Let us dwell in the concepts of uncertainty and probability at the human regime.

Suppose that your task is to report the number of people around the world that use glasses due to visual impairment. Moreover, you only have a couple months to achieve this goal. How do you do it?

Notice that in principle the task is straightforward, you just have to count. However, in practice, you can imagine that it is extremely difficult and basically impossible due to the time window for submitting the report.

The information is there but you can not access to it completely. On the other hand, you need your job…

Instead of aiming to give the information for each person, you can group people and associate a number to them. The grouping can be thought as intervals in age, say 5-10, 10-15, 15-20, 20-30 and so on. Then, the number associated to each group will be the probability of using glasses. For example, the probability of humans between 50-60 years of age using glasses is 70% (I invented this number). This means that if we pick 10,000 people around the globe in that age interval, 7,000 of them will be expected to use glasses. For the moment, let us ignore how we actually arrive to this number and simply notice that for the rest of 3,000 people this assertion will be not be necessarily accurate.

This is the price to pay if we represent a collection of data with a single number.

We see that the uncertainty emerges from the lack of full knowledge and probability is a measure of it.

What about the quantum uncertainty? Imagine that you consider a single person and ask in different occasions if it has visual impairment. The person gives different answer each time you ask. You will be suspicious and think that the person is playing with you.

What if the person is a quantum particle, like an electron, and the question is the state of its spin2 ?

Then, you will realize that is not playing games. The answer for this single person it is sometimes yes and sometimes no without lying to you. Very strange! This is how quantum uncertainty appears in nature3. To be clear, if the person does that in reality, clearly it is messing with you.

From now on we will only discuss uncertainties outside the quantum world and forget about the equations displayed above. The aim was to inform you that they actually exists and that nature, in its core, has this strange behaviour4.

Returning to the «real-world» problem of not losing your job, we see that probabilities helped you to give some results. Recall that among the 3000 people, some of them can actually have visual impairment. Assigning a probability misses details.

This is crucial. Suppose that instead of visual impairment we deal with a lethal virus and the study is done in order to provide vaccines around the world. Details will be important but nevertheless the problem of counting cannot be solved in practical terms.

Hence, we see that probability can be used as a tool for solving problems in society. Nevertheless, regarding humans as numbers can led you to oversimplifications and these can have negative impact for some people.

I’ll discuss this point in more detail in the next post of Data Science in society.

  1. The equations are known to be incomplete. Still there is not a complete description of gravity and quantum mechanics and it has been conjectured that are more particles but we have not been able to detect them. ↩︎
  2. The spin is an intrinsic property of particles in the quantum world. This is the only information you need in order to understand the point. ↩︎
  3. I recommend learning about Schrödinger’s cat if you want to know more about this type of uncertainty. You can see https://www.youtube.com/watch?v=UjaAxUO6-Uw ↩︎
  4. Data science for quantum phenomena deals with the statistics of statistics… ↩︎

Deja un comentario