Bayesian Funk

Life in the 2020s started off rather rough, as I’m sure you are all aware. When COVID-19 struck, it seemed the world shut down. Questions abounded, regarding the nature of the illness, how it spreads, how deadly it is, and how to remain safe. In the face of immense uncertainty, various businesses closed, and schools were forced to haphazardly improvise online curricula. For me, this meant an early end to my freshman year of college.

Come Fall of 2020, however, my school was largely able to return to business as usual, save a few important changes. The obvious ones include masks and social distancing. Slightly more consequential were the updates made to the visitor policy and the cafeteria (which was severely crippled, quality-wise). Most scrupulously, however, was the requirement of every student to be tested for the virus once a week (more for certain organizations, such as athletics and choir).

The policy was simple and appropriately extreme, all things considered. If a student’s test came back positive, they would be required to quarantine (either on or off of the campus) for 2 weeks. Anybody who had been in their vicinity for an extended period (roommates, etc) would then be required to isolate.

This didn’t bother me. After all, the global “going” was getting tough. It was only natural that we get tougher. Some of my friends (and some skeptics as well), however, expressed concern over the accuracy of the tests being used. Some didn’t trust the tests and felt that even if they did receive a negative, they would sooner trust their symptoms than the result. Others were the opposite; they were so distrustful of the tests that given a positive result, they would simply ignore it. These concerns were magnified with the introduction of the rapid tests, which gave results in as little as 15 minutes.

With the COVID era nearly (hopefully) over, I’m now interested in seeing exactly what the numbers say. If one were to test positive using one of these tests, what are the chances that they actually have the disease? Furthermore, what if a person received two tests with contradicting results?

In order to find these numbers, one must employ Bayes’ Theorem.

Good ol’ Bayes

The above formula gives the probability for event A given that event B has occurred. It does so through three other probabilities: the probability of event B given event A (P(B|A)), the probability of event (P(A)), and the probability of event B (P(B)). For our purposes, event A is “having the disease”, and event B is “testing positive”.

According to this site, the tests used have 97% sensitivity and 99% specificity. This may sound meaningless to some of you, so allow me to explain. A test’s sensitivity is the probability that the test returns a positive result for a person that actually has the virus. A test’s specificity, on the other hand, is the probability that the test comes back negative for a person who truly does not have the virus.

In other words, sensitivity is the proportion of correctly identified infected individuals to all infected individuals. Specificity is the proportion of correctly identified uninfected individuals to all uninfected individuals.

All that is left is information as to what percentage of the population (we’ll use that of the US) had the virus around 2020–2021. This site reports the number to be around 17% for the period of interest (January is around the dead center of the school year, after all).

With all the research out of the way, let’s do the math:

Plugging in the numbers that we know

At this point in the calculation, we need to find the probability of a positive test result. This can be done by simply splitting the value into the probability of positive given the virus and the probability of positive given no virus. We take these values, multiply them by their respective probabilities (of having or not having the virus), and add them together:

Finishing the calculation

As you can see, given a single test, the probability of having the virus is 95%. All things considered, this is rather high, certainly high enough (I think) for most people to believe confidently in the positive test result.

What if we flip the script? What are the odds of having the virus given a negative test result?

Note again that P(-) was broken down into two components

It appears that a negative test means that you can rest assured. With a mere 0.6% chance, it is highly unlikely to be infected given a negative test.

For fun, let’s find these probabilities for each other test on that site. I sped up this process using Python, and the code for the process (as well as everything else that follows) can be found on GitHub. Here are my results in a table:

Table of probabilities for each test.

Here are those same results, but as pretty graphs:

Finally, let’s suppose that a given individual took two tests at the same time and that one test came back positive and the other negative.

Here, we assume that the two test results are independent of each other, meaning that the result of one has no bearing on the result of the other. This necessary as it allows us to decompose the probability of two events occurring at once into two terms. Specifically:

Given that events A and B are independent.

With this in mind, let’s calculate the probability in question.

Note that as P(+) + P(-) = 1, then P(-) = 1 — P(-)

The news is good. Given both a negative and a positive test, the chances of having the virus shrink to a mere 3%. This seems to imply that negative tests do more to sway your chances of having the virus than positive tests. This makes sense, as a positive test brought you to a mere 87% of having the virus. Meanwhile, a negative test brings you to a whopping 99.8% of not having the virus.

One might think that this is due to the fact that this test has a higher specificity than it does sensitivity. In reality, this is likely caused by the fact that the prior probability of having the virus is relatively low (only 17%). Bayesians call this number the “prior”, referring to the prior probability of some event before the new information (from the test results, in this case) comes into consideration.

To prove that the small number found above results from the relative smallness of the prior, I took a look at the chance of having the virus given a positive and negative test result for priors ranging from 10% (0.10) to 90% (0.90) and for every test in the above table. Essentially, this graph imagines what things would look like if 10% of all people had the virus, 90% of all people had the virus, and every precentage in between. These are the results:

As is clear, a higher prior leads to a higher probability of having the virus when given conflicting test results.

So that settles that!

Overall, the moral of the story is to trust tests. Chances are that the result given to you by the lab is reflective of your actual status as a carrier (or not) of the virus. If you do decide to get a second test (and receive conflicting information), refer to the above to know how to proceed. At that point, your chances of having the virus heavily depend on your prior probability (represented here as a percentage of the nation with the virus).

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store