Let’s dig deeper into the Measures of Central Tendency (Mean, Median, Mode), and when each one is the best to use.
1. Mean (Average) #
Formula: Mean=Sum of all valuesNumber of values\text{Mean} = \frac{\text{Sum of all values}}{\text{Number of values}}Mean=Number of valuesSum of all values
Best when:
- Data is numeric and symmetric (not too many extreme values).
- You want a balance point of all values.
Example (quality context):
- Average cycle time of a machine over 30 runs.
- If times are [10, 12, 11, 13, 12], mean = 11.6 → good summary.
Caution:
- Sensitive to outliers (extreme values).
- If one run took 60 seconds due to a jam, mean would shoot up → not representative.
2. Median (Middle Value) #
Definition: Middle value when data is sorted.
- If odd number of observations → exact middle.
- If even → average of two middle values.
Best when:
- Data is skewed (not symmetric).
- Outliers exist.
- You care about the “typical” value, not influenced by extremes.
Example (salary case):
- Salaries of 7 engineers: [20k, 22k, 23k, 25k, 28k, 30k, 150k]
- Mean ≈ 42.5k (pulled up by the outlier).
- Median = 25k (better reflection of most people).
In quality:
- Median defect count per shift (if some rare shifts have extremely high numbers).
3. Mode (Most Frequent Value) #
Definition: Value that occurs most often.
- Can be more than one mode (bimodal, multimodal).
- Sometimes no mode (if all values unique).
Best when:
- Data is categorical (like colors, defect types).
- You want the most common value.
- Useful in marketing, manufacturing, and reliability analysis.
Examples:
- Most common defect type = “scratch” (mode).
- Most common customer complaint category.
- Most popular car color in a survey.
🌟 Quick Decision Guide #
- Use Mean → if data is numeric, roughly symmetric, no big outliers.
- Use Median → if data is skewed, or has extreme outliers.
- Use Mode → if data is categorical, or you want the most frequent case.
✅ To check your understanding:
Suppose you are analyzing delivery times (in minutes) of spare parts from suppliers:
[2, 3, 3, 4, 4, 4, 5, 6, 15]
👉 Which measure (Mean, Median, or Mode) would you choose to report the “typical” delivery time here, and why?
The answer is:
- The mean here would be pulled up because of the outlier (15 minutes).
- The mode (4 minutes) tells us the most frequent case, but it doesn’t represent the overall “center.”
- The median (4 minutes) is the best choice because it resists the influence of that extreme value.
So in practice:
- If you report median delivery time = 4 minutes, suppliers and managers get a fair idea of what usually happens.
- If you reported mean = 5.1 minutes, it would look worse than reality because of one bad case.
✅ Summary Tip: Think of it like this
- Mean = Balance point (good for symmetric data).
- Median = Middle person in line (good when data is skewed).
- Mode = Most common item in a basket (good for categories).
Now, want to try applying this for another example?
👉 Take the data I gave earlier for daily production units:
[50, 55, 60, 65, 70, 75, 80]
- Mean = 65 (sum = 455 ÷ 7 days = 65)
- Median = 65 (middle value since 65 is the 4th number when sorted)
- Mode = None (all values occur only once)
This is a nice example where mean and median are equal → the data is perfectly symmetric.
Now that you’ve got the “center” of data, the next natural step is to see how spread out the data is. Because two processes can have the same average but very different consistency.
For example:
- Process A daily output: [65, 65, 65, 65, 65, 65, 65] → Mean = 65, SD = 0 (very consistent).
- Process B daily output: [10, 20, 30, 100, 120, 140, 150] → Mean ≈ 81, but outputs are all over the place (high spread).
👉 Question for you:
Looking back at your production data [50, 55, 60, 65, 70, 75, 80], do you think this process looks consistent or highly variable? Why?
here’s the twist: this process is actually not highly variable — it’s pretty consistent. Let me explain why:
- The data increases steadily by +5 each day.
- Range = 80 – 50 = 30 units → compared to the mean (65), that’s not very wide.
- Standard deviation (we’ll calculate next) is small relative to the mean.
So even though the numbers are spread out, they’re spread in an orderly and predictable way, not scattered randomly. That’s a sign of stability.