Many intro stats courses start out with an overview of levels of measurement, sometimes called types of measurement scales. There will be some powerpoint slides, all of a sudden the meaning of 0 will become a confusing matter of grave importance, you might have a quiz, and then the whole sordid affair will fade into the background occasionally being drudged up when talking about assumptions.
But wait a minute, this stuff is like, sort of important. There is a reason this is the topic many stats courses cover first. Math in the k-12 arena tends to use numbers. You might be thinking yeah, duh, I mean it is math. But the thing is when you are moving into the realm of stats and measurement you are shifting from viewing a number as just a number, to something a bit more fuzzy. Welcome to the real world, things can get a little messy.
So let's begin with levels of measurement. In science (yes, even social science) data (information) is collected. This data is obtained by using some sort of scale. That data can be analyzed and might even result in meaningful conclusions, given that you understand what you have collected and how to analyze your unique special snowflake data.
Type of scale will help you to determine which analyses will provide you with meaningful results.
So let's discuss levels of measurement and the implications.
Nominal
Nominal scales have two or more categories . The data produced by these scales are unique in that different values indicate different classifications, but those differences do not have an implied order.
It does not make sense to calculate the average value of a variable that is scored on a nominal scale. However can perform calculations that are based on frequencies (counting how many subjects responded in a given way).
Consider the following question:
Which flavor of ice cream is your favorite?
- Vanilla
- Chocolate
- Strawberry
- Rocky Road
The dataset (collection of individual responses) may contain the numbers 1-4 to indicate the response to this question. However, those numbers don't really mean much, someone who answered 4 did not score higher than someone who marked 1, those two people just answered differently (even if Vanilla is lame and Rocky Road will always reign supreme).
Other commonly used examples of variables measured with Nominal scales are: Race, political affiliation, gender, major, and religion.
Ordinal
The values produced by ordinal scales have an implied order but there is not necessarily an equal distance between values. Ordinal scales often occur when people or things are being ranked.
One of the most common examples of an ordinal scale is how a person places in a race (1st, 2nd, 3rd, etc...). A person who comes in first in a marathon completed the marathon faster than the person who came in second. However we cannot say that the time difference between the 1st place finisher and the 2nd place finisher is equal to the time difference between the 2nd place finisher and the third place finisher.
Would it make sense to take an average of ordinal variable? Let's stick with the example of a marathon runner. If a person wanted to track his or her performance in various marathons would he or she want to look at his or her average place (which could have a value of 5.5) or would he or she want to look at his or her average time? I would argue that an average place does not have a clear meaning. If I were in the habit of running marathons (full disclosure: I'm most certainly not) I would be concerned with my average time and how my new times compare to my old, rather than how I was ranking compared to a changing group of competitors.
Other common examples of Ordinal scales are: class rank and items that use a likert scale (strongly disagree... strongly agree).
Interval
Interval scales, like ordinal scales, contain values with a meaningful order. Interval scales also have equal distances between the values. Interval scales, however, do not have an absolute zero point.
A common example of an interval scale is temperature when measured in degrees Fahrenheit or degrees Celsius. The difference between 0° Fahrenheit and 10° Fahrenheit is the same as the difference between -10° Fahrenheit and 0° Fahrenheit.
Calculating the average temperature for a given month would give us a meaningful result. However, we cannot say that when it is 40° Fahrenheit that it is twice as hot as when it is 20° Fahrenheit. That is because a value of 0° Fahrenheit is not an absolute 0 point. There can be negative values. Think about this:
If I claimed that 20° was 2 times as cold as 40° then, applying the same reasoning, -20° would be -2 times as cold as 40°. That does not make much sense.
Other examples of interval scales are: shoe size and women's pants size (0, 2, 4...).
Ratio
Ratio scales have values with a meaningful order, equal distances between points, and an absolute 0 point. The absolute 0 point indicates that none of the variable being measured is present.
Common examples of variables measured with Ratio scales include height (inches or centimeters), weight (pounds), income (dollars), and age (years).
As the name implies, Ratio scales mean that ratio calculations can be meaningfully applied. Sally can be twice as tall as Billy.
Choosing a Level of Measurement
So at this point you may be feeling pretty comfortable with levels of measurement, that's great! Sometimes thing can get a little fuzzy, however. For example think about two possible questions we could include in a survey:
Question A: Please enter your yearly income: _________
Question B: Please indicate your income level:
- Under $25,000
- Between $25,000 and $50,000
- Between $50,000 to $100,000
- Above $100,000
The responses to question A would provide us with data on a Ratio scale. Question A has a 0 point of $0 and the value of a single dollar is a standard metric, giving us equal distance between points.
The responses to question B, however, would give us data on an Ordinal scale. Income is lowest for response option 1 and highest for response option 4. Note that the difference in income between the response options is not constant.
Why would someone choose to use Question B instead of Question A? Well, Question B may result in less user error. When respondents are asked to fill in blanks $10,000 can easily turn into $100,000 by mistake. Maybe the researcher just wants to get a general idea of the income spread of survey respondents to ensure that the sample reflects the population of interest but the researcher is not interested in doing any in-depth analysis of income.
When considering what type of data you will collect it is important to determine what questions you want to answer with your data. The more clearly defined your questions are, the easier it will be to design a study and analysis plan.
Shades of Gray
This next section is generally beyond the level of an introductory course, but it may be worthwhile to read and ponder if you are thinking of pursuing a career in the social sciences, or if you just love learning/ measurement/ procrastinating.
While levels of measurement may seem clear cut at this stage things can get wonky, especially in the behavioral sciences. For example, think about a scale measuring depression. Picture a simple, 10 item scale where a person either marks "Agree" or "Disagree" for each item on the scale. Items may be similar to "I have felt sad in the past week." and "I have considered killing myself in the past week." People could endorse 0 items all the way up to 10 items.
Since it is possible for people to endorse 0 items you may initially think this is a Ratio scale. But does endorsing 0 items indicate a complete lack of depression? Maybe, it would be pretty impressive if we covered all possible indicators of depression in only 10 items.
Say we give up on an absolute 0 point. Each point indicates an endorsement of a question so it would be reasonable to think we have equal distance between points. So we have an interval scale, right? This would mean that each question should carry an equal weight of depression. Is feeling sad equal to thinking about suicide? Things are getting tricky.
Ok so maybe we don't have a 0 points, and maybe we don't have equal distances. But a score of 6 is definitely greater than a score of 5, and so on. Or is it? If our questions may indicate different amounts (or severity) of depression then how can we order people based on a simple count of the number of items endorsed? What if one person endorsed 6 seemingly less serious items and another person endorsed 4 seemingly very serious items.
Questions related to measurement can be confusing. In the social sciences we rely heavily on surveys to give us insights into human thoughts and behaviors. It is important to remember that things are not always as straightforward as they seem, or as we wish them to be. However, these grey areas can be intriguing. If we knew everything then what would be left to argue about during department happy hours?