Identifying System Malfunctions Using Bayes Theorem

Ross Wilhite
9 min readFeb 23, 2021

An Intro to Probabilistic Robotics

Modern robotics engineers are becoming increasingly interested in making fully autonomous systems. Technologies such as self-driving cars, drones, industrial robotics, or similar systems all need to read their surroundings and respond accordingly. To make sense of the highly variable environments they work in, these systems utilize modern concepts such as neural networks, sensor fusion, and probabilistic robotics. To understand how autonomous system’s decision making processes work, an engineer must be familiar with a few concepts in probability theory and statistics. This article gives a basic overview of Baye’s theorem, a standard in probabilities, and a real world application to aid in understanding.

Probability Theory Syntax

P(x) — In probability theory, you will see this formatting very often. Similar to function syntax in calculus, P(x) represents the probability of a variable A equaling x. If you wanted to represent the probability of a value equaling n, you would write P(n).

P(x,y) — Represents the state of two variables. This is identical to the boolean property AND . To represent the probability that A = n and B = x , you would write P(n,x) .

P(x|y) — This symbol is the basis of Baye’s theorem. It is known as conditional probability. It is often read as “The probability of x, given y”. This is notably different than P(x,y).

Our System to Solve

In this example, we will imagine a theoretical robotic system. To keep things simple, we will have one single variable, though of course in the real world there will be much more than one variable. Let’s imagine a sensor on a robot that is trying to read its operating temperature in celsius to determine if it should active it’s cooling cycle. To simplify the system even further, let’s imagine that the temperatures follow an even distribution from 0–30. This means you are just as likely to have a temperature of 3 as you are to have a temperature of 29. While this does not represent a realistic distribution, it will help us better understand the theorem.

However, the manufacturer has warned us that there is a chance that our sensor is faulty, as their recent QA in the factory determined that a small percent of their sensors are only returning a temperature between 0 and 10.

In their testing, they have estimated that around .05% of their sensors are having this fault, and recommend testing the sensor. In order to test our sensor, we have decided to gather data from the sensor, once per hour. Every time we read the data, we are getting readings between 0 and 10, however, our manager has told us that we should only contact the sensor manufacturer if we are 95% sure that the sensor is faulty, as replacing the sensors on the system is a huge expense. How many times do we need to read the temperature, before we can be 95% sure the sensor is broken?

Baye’s Theorem

This is an example of when Baye’s theorem will come in handy. By running this equation recursively N number of times, we can find out how many readings we need to take before we are 95% sure that the sensor is broken.

Baye’s Theorem in discrete case

Baye’s theorem is used to discuss the probability of an event occurring, given specific evidence. There are many formulations and proofs online, if you want further information beyond application of the theorem. Later in this article, in the Conclusions section, I will come back to what Baye’s theorem is representing.

To use Baye’s theorem accurately, we need 3 things: a hypothesis, evidence, and a the probability of our evidence occurring. In this case, our hypothesis is the probability that the sensor is broken, our evidence is the temperature reading below 10, and the probability of the temperature reading below ten is 1/3, or about 33.3%.

Here are a few terms you will run into when reading or solving statistical inferences:

Prior — P(H)

This is our hypothesis of the probability of a value, prior to the system running. In the equation above, this is represented by P(H), and is our hypothesis of how often the sensors are faulty from the factory.

Observation — E

This is the real world representation of our estimation. In this system, it represents the actual temperature in the real world. So P(E) represents how often the temperature is E .

Posterior — P(H|E)

This is the value that we are solving for, it is the probability (or in some cases the distribution of probabilities) of our hypothesis being true, given the real world observations.

Generative Model — P(E|H)

In this case, our generative model is the probability of our observation, given the hypothesis.

There is another important theorem we will be using here, this is the theorem of total probability. This states the following:

theorem of total probability in the discrete case

This is based off of the definition of conditional probability. This theorem states that the probability of x can be known using the conditional probabilities of y, each with a known probability itself. There are also formulations of this theorem readily available online.

Substituting this for P(E) in Baye’s theorem, and swapping the comprehension-friendly E, and I for the more technical x , and y , we will get the following equation which we will use with the initial conditions:

Discrete Baye’s theorem with theorem of total probability

Initial conditions

Our initial conditions for this system are very important. These are the numbers we will be plugging into Baye’s theorem. We have the following variables:

A — Whether our system is faulty or not, so P(faulty) represents the probability our system is faulty.

T — The observed temperature. Since we do not know the real world temperature, we will only be using the observed temperature to solve this system.

B — Whether the temperature is between 0–10 in our observations. So P(T<10) would represent the probability that we observe a temperature below 10.

For ease of reading and mathematics, instead of percentages, I will be using decimals. In other words:

∑P(A) = 1 or ∫P(A)dA = 1 in a continuous system

Here are the initial conditions represented in an observation model:

Initial conditions

The columns represent the collection of probabilities given A , the sensor state, and the rows represent the collection of probabilities given B , the state of the temperature being below or above 10 degrees. Each cell individually represents the probability of B occurring. A text representation of the bottom left cell would be the following: P(T<10|not faulty) = .6667 (rounded to 4 significant figures. The cells in this square therefore represent our P(y|x) that we need to plug into Baye’s theorem.

Our other initial conditions are as follows:

P(faulty)= 0.0005 — These are the manufacturer’s estimations of how likely a given sensor is faulty. This value gives us P(x) in the initial state. But in every iteration we will replace this with what Baye’s theorem gave us in the previous iteration.

P(not faulty) = 0.9995 — This is just 1 - P(faulty) and will be used in the denominator of our equation.

Substituting our variables into Baye’s theorem, we get the equation that we will need to recursively solve:

Probability that our sensor is faulty

Iterating the Conditions

To solve this, I put the equation into Matlab:

solutions = [];
step = 1;
Pfaulty = 0.0005;
Pnotfaulty = 0.0005;
PTl10faulty = 1;
PTl10notfaulty = 1/3;
while Pfaulty < 0.95 || step == 100
Pfaulty = (PTl10faulty * Pfaulty) ...
/ ((PTl10faulty * Pfaulty) + (PTl10notfaulty * (1 - Pfaulty)));
solutions(step) = Pfaulty;
step = step + 1;
end
plot(solutions);

This code will allow us to run through N iterations on the algorithm we came up with earlier. When the loop stops, we will have our result, and step will equal N, or how many iterations we went through to find the answer. We also develop a plot of the P(faulty) data. After we run this, we see that after 10 iterations, we have a 96.73% certainty that the sensor is faulty.

Our certainty after each iteration

If we let the process continue, it will continue to approach 1, but here, we can be 95% certain, which should be enough to swap out the sensors.

Conclusions

Now we can gain a complete understanding of why Baye’s theorem is useful. Most people’s natural reaction to solve this problem would to be to find out how likely it is to receive a reading of T<10 N times in a row, and subtract from 1 the solution. This does one thing right, it tells you when you have reached a 5% chance of receiving T<10 N times in a row, and this can be useful to indicate there may be something wrong. However, with this method, you will be missing a key variable; the likelihood that the sensor is broken would still be lower than the likelihood of receiving those readings.

Here is a graph I made using a binomial distribution function in Matlab, which shows the likelihood of getting a T<10 reading, N times in a row:

Likelihood of reading T<10 N times in a row, plotted over N

You can see that after 3 rolls we have a likelihood below 5%, if we were to follow the binomial distribution method, we would’ve said that the sensor was faulty, when there was actually only a 1.33% chance the sensor was faulty, using Baye’s theorem. So why are we so uncertain that the sensor is fauly, even when there is a 95% chance that we would not receive 3 consecutive readings of T<10 .

This is the key factor we are forgetting, that the absolute likelihood of a faulty sensor is .05%, which is still much lower than the 5% chance of consecutive readings. While we are in an unlikely state, it is still much more unlikely that the sensor is broken. In fact, this is what Baye’s theorem is measuring, it is essentially comparing these two numbers and telling us which we should trust more.

We can see solid evidence of this when we compare the probabilities of both measurements after 7 iterations. The absolute probability of a broken sensor is still .05%, and the probability of 7 successive readings of T<10 is .0457%. This is the pivotal moment when the probabilities of both are almost the same, and what does Baye’s theorem give us? About 50%.

So now we can see what Baye’s theorem is really measuring, it’s just:

The true measurement of Baye’s theorem revealed

This shows why Baye’s theorem is so wonderful, it can reveal many misconceptions of statistics, allowing us to compare two probabilities, rather than make conclusions on one single probability.

Other Applications

There are many other applications to this theorem, and if you ever choose to study probability theory you will see it used widely. You will see this often in medicine for problems such as “the percentage that a patient has rheumatoid arthritis”. In this example we would be comparing the probability that a patient has the symptoms of rheumatoid arthritis without actually having the ailment, to the probability of having that ailment. It is also used in corporate finance. If you are interested in more applications of the theorem, I recommend searching online, there are numerous videos and articles which show further use of this theorem.

--

--

Ross Wilhite

Autonomous systems/robotics engineer currently studying at Tufts University. I have 3 years of experience in e-commerce backend development for data governance.