Let’s consider a data set that represents the temperatures of 12 different objects in a room. If 11 of the objects have temperatures within a few degrees of 70 degrees Fahrenheit (21 degrees Celsius), but the twelfth object, an oven, has a temperature of 300 degrees Fahrenheit (150 degrees Celsius), a cursory examination can tell you that the oven is a likely outlier. .
Let’s consider a data set that represents the temperatures of 12 different objects in a room. If 11 of the objects have temperatures within a few degrees of 70 degrees Fahrenheit (21 degrees Celsius), but the twelfth object, an oven, has a temperature of 300 degrees Fahrenheit (150 degrees Celsius), a cursory examination can tell you that the oven is a likely outlier. .
Let’s continue with the example above. Here is our data set representing the temperatures of several objects in a room: {71, 70, 73, 70, 70, 69, 70, 72, 71, 300, 71, 69}. If we order the values in the data set from lowest to highest, our new set of values is: {69, 69, 70, 70, 70, 70, 71, 71, 71, 72, 73, 300}.
Don’t be confused by data sets with even numbers of points - the average of the two middle points will often be a number that doesn’t appear in the data set itself - this is OK. However, if the two middle points are the same number, the average, obviously, will be this number as well, which is also OK. In our example, we have 12 points. The middle 2 terms are points 6 and 7 - 70 and 71, respectively. So, the median for our data set is the average of these two points: ((70 + 71) / 2), = 70. 5.
In our example, 6 points lie above the median and 6 points lie below it. This means that, to find the lower quartile, we will need to average the two middle points of the bottom six points. Points 3 and 4 of the bottom 6 are both equal to 70. Thus, their average is ((70 + 70) / 2), = 70. 70 will be our value for Q1
Continuing with the example above, the two middle points of the 6 points above the median are 71 and 72. Averaging these 2 points gives ((71 + 72) / 2), = 71. 5. 71. 5 will be our value for Q3.
In our example, our values for Q1 and Q3 are 70 and 71. 5, respectively. To find the interquartile range, we subtract Q3 - Q1: 71. 5 - 70 = 1. 5. Note that this works even if Q1, Q3, or both are negative numbers. For example, if our Q1 value was -70, our interquartile range would be 71. 5 - (-70) = 141. 5, which is correct.
In our example, the interquartile range is (71. 5 - 70), or 1. 5. Multiplying this by 1. 5 yields 2. 25. We add this number to Q3 and subtract it from Q1 to find the boundaries of the inner fences as follows: 71. 5 + 2. 25 = 73. 75 70 - 2. 25 = 67. 75 Thus, the boundaries of our inner fence are 67. 75 and 73. 75. In our data set, only the temperature of the oven - 300 degrees - lies outside this range and thus may be a mild outlier. However, we have yet to determine if this temperature is a major outlier, so let’s not draw any conclusions until we do so. {“smallUrl”:“https://www. wikihow. com/images/thumb/c/cc/Calculate-Outliers-Step-7Bullet2. jpg/v4-460px-Calculate-Outliers-Step-7Bullet2. jpg”,“bigUrl”:"/images/thumb/c/cc/Calculate-Outliers-Step-7Bullet2. jpg/aid1448091-v4-728px-Calculate-Outliers-Step-7Bullet2. jpg",“smallWidth”:460,“smallHeight”:345,“bigWidth”:728,“bigHeight”:546,“licensing”:"<div class="mw-parser-output">
License: <a target="_blank" rel="nofollow noreferrer noopener" class="external text" href="https://creativecommons.
org/licenses/by-nc-sa/3.
0/">Creative Commons</a>
\n</p>
<br />\n</p></div>"}
In our example, multiplying the interquartile range above by 3 yields (1. 5 * 3), or 4. 5. We find the boundaries of the outer fence in the same fashion as before: 71. 5 + 4. 5 = 76 70 - 4. 5 = 65. 5 The boundaries of our outer fence are 65. 5 and 76. Any data points that lie outside the outer fences are considered major outliers. In this example, the oven temperature, 300 degrees, lies well outside the outer fences, so it’s definitely a major outlier. {“smallUrl”:“https://www. wikihow. com/images/thumb/9/9d/Calculate-Outliers-Step-8Bullet2. jpg/v4-460px-Calculate-Outliers-Step-8Bullet2. jpg”,“bigUrl”:"/images/thumb/9/9d/Calculate-Outliers-Step-8Bullet2. jpg/aid1448091-v4-728px-Calculate-Outliers-Step-8Bullet2. jpg",“smallWidth”:460,“smallHeight”:345,“bigWidth”:728,“bigHeight”:546,“licensing”:"<div class="mw-parser-output">
License: <a target="_blank" rel="nofollow noreferrer noopener" class="external text" href="https://creativecommons.
org/licenses/by-nc-sa/3.
0/">Creative Commons</a>
\n</p>
<br />\n</p></div>"}
Another criterion to consider is whether outliers significantly impact the mean (average) of a data set in a way that skews it or makes it appear misleading. This is especially important to consider if you intend to draw conclusions from the mean of your data set. Let’s assess our example. In our example, since it’s highly unlikely that the oven reached a temperature of 300 degrees through some unforeseen natural force, we can conclude with near-certainty that the oven was accidentally left on, resulting in the anomalous high temperature reading. Also, if we don’t omit the outlier, the mean of our data set is (69 + 69 + 70 + 70 + 70 + 70 + 71 + 71 + 71 + 72 + 73 + 300)/12 = 89. 67 degrees, while the mean if we do omit the outlier is (69 + 69 + 70 + 70 + 70 + 70 + 71 + 71 + 71 + 72 + 73)/11 = 70. 55. Since the outlier can be attributed to human error and because it’s inaccurate to say that this room’s average temperature was almost 90 degrees, we should opt to omit our outlier.
Another criterion to consider is whether outliers significantly impact the mean (average) of a data set in a way that skews it or makes it appear misleading. This is especially important to consider if you intend to draw conclusions from the mean of your data set. Let’s assess our example. In our example, since it’s highly unlikely that the oven reached a temperature of 300 degrees through some unforeseen natural force, we can conclude with near-certainty that the oven was accidentally left on, resulting in the anomalous high temperature reading. Also, if we don’t omit the outlier, the mean of our data set is (69 + 69 + 70 + 70 + 70 + 70 + 71 + 71 + 71 + 72 + 73 + 300)/12 = 89. 67 degrees, while the mean if we do omit the outlier is (69 + 69 + 70 + 70 + 70 + 70 + 71 + 71 + 71 + 72 + 73)/11 = 70. 55. Since the outlier can be attributed to human error and because it’s inaccurate to say that this room’s average temperature was almost 90 degrees, we should opt to omit our outlier.
For instance, let’s say that we’re designing a new drug to increase the size of fish in a fish farm. We’ll use our old data set ({71, 70, 73, 70, 70, 69, 70, 72, 71, 300, 71, 69}), except, this time, each point will represent the mass of a fish (in grams) after being treated with a different experimental drug from birth. In other words, the first drug gave one fish a mass of 71 grams, the second drug gave a different fish a mass of 70 grams, and so on. In this situation, 300 is still a big outlier, but we shouldn’t omit it because, assuming it’s not due to an error, it represents a significant success in our experiment. The drug that yielded a 300 gram fish worked better than all the other drugs, so this point is actually the most important one in our data set, rather than the least.