1. What is Statistics? #
- Statistics = The science of collecting, analyzing, and interpreting data.
- In data science, statistics helps us make sense of data and draw conclusions from samples.
2. Population vs Sample #
- Population = the entire group you want to study.
Example: All quality engineers in India. - Sample = a small subset taken from the population to study.
Example: 200 quality engineers surveyed from different companies.
๐ We usually study the sample, because studying the whole population is costly or impossible.
3. Types of Data #
- Qualitative (categorical): descriptive data
- Example: gender (male/female), defect type (scratch, dent, crack)
- Quantitative (numerical): numbers that can be measured
- Discrete (countable): number of defects, number of cars sold
- Continuous (measurable): weight, temperature, length
4. Scales of Measurement #
This tells us how we can analyze data. Four levels:
- Nominal โ categories only (e.g., blood group: A, B, O)
- Ordinal โ categories with order, but no exact difference (e.g., ranking: 1st, 2nd, 3rd)
- Interval โ numeric scale, but no true zero (e.g., temperature in ยฐC)
- Ratio โ numeric scale with a true zero (e.g., weight, length, salary)
โ
Quick Check for You:
Suppose we are collecting the following data:
- The brand of car someone drives (Toyota, Honda, Ford)
- The temperature of an engine in ยฐC
- The ranking of students in a class (1st, 2nd, 3rd)
- The mileage (km/liter) of a vehicle
๐ Can you tell me which scale of measurement (nominal, ordinal, interval, ratio) each belongs to?
Right answer is โ
- Car brand โ Nominal (just categories, no order)
- Engine temperature โ Interval (differences matter, but zero ยฐC doesnโt mean โno temperatureโ)
- Student ranking โ Ordinal (order matters, but difference between 1st and 2nd isnโt exactly equal to 2nd and 3rd)
- Mileage โ Ratio (true zero, ratios make sense: 20 km/l is double 10 km/l)