graph TD; A(Variable) --> B(Qualitative); A(Variable) --> C(Quantitative); C --> D(Discrete); C --> E(Continuous);
4 Variable Classification
A variable is a property that can vary between different units in a population. Understanding the classification of variables is fundamental to selecting appropriate statistical methods and interpreting results accurately. The nature of a variable will for example determine:
The choice of measures of central tendency and dispersion
The choice of correlation measures
The selection of appropriate graphical representations
4.1 Types of Variables
To facilitate analysis, we focus on measurable variables, which are broadly categorized into quantitative and qualitative types. There are further sub-classes for each of these main classes, as shown in Figure 4.1 and exemplified further in the following.
4.1.1 Quantitative (Numerical)
These variables are represented by numbers and can be measured. Depending on the type of numbers a variable takes, it can be classified as discrete or continuous.
Discrete variables take specific, distinct values and cannot be subdivided (natural, integer, or rational numbers). Think of these as variables holding countable values. Examples include number of children in a family, number of goals ina football match, and number of sales transactions per day.
Continuous variables can take any value within a given range of values within an interval and can be infinitely divided (real numbers). Examples include hegiht, weight, stock price and distance.
4.1.2 Qualitative (Categorical) Variables
These variables represent data that can be divided into distinct groups or categories. These values do not have a natural numerical order (except for ordinal variables) and must be coded into numerical values for statistical analysis. Examples include political affiliation, blood type, movie genre and social media platform.
A special case of qualitative variables that take only two possible values, so called dichotomous or binary variable. Examples include smoking status (Smoker/Non-smoker) and COVID-19 test result (positive/negative).
4.2 Levels of Measurement
The characteristics of collected data allow us to classify them into four different measurement levels. These levels determine the statistical operations that can be performed. Table 4.1 summarizes the different measurement levels described in the following.
Nominal scale categorizes data without any inherent order. For example, there is no inherent ordering in the variable eye color (blue, brown, green), gender or nationality. You can only distinguish between the values nominal variables take.
Ordinal scale is when data can be ranked in a meaningful order, but differences between values are not necessarily equal or meaningful, for example when looking at the variable fruit preference ranking or customer satisfaction levels (low, medium, high). In other words, interval lengths between one variable value and another are not of the same length.
Interval scale is similar to the ordinal scale but with equal intervals between values. However, it lacks a true zero point. Examples include Temperature in Celsius and calendar years (the year 2000 is 100 years after 1900, but the year 0 does not represent the “beginning of time”).
Finally, ratio scale has all of the above properties but also a natural zero point, allowing meaningful calculations of differences and ratios. Examples here include salary, distance traveled, height, and weight.
Distinguish | Rank | Equal Step Length | Absolute Zero Point | Example | |
---|---|---|---|---|---|
Nominal | Yes | No | No | No | gender, city, religion |
Ordinal | Yes | Yes | No | No | grades, preference, customer satisfaction ratings |
Interval | Yes | Yes | Yes | No | temperature in Celsius. credit scores, calender years |
Ratio | Yes | Yes | Yes | Yes | length, weight, time, salary |
4.3 Types of Numbers
Understanding the nature of numbers is crucial when classifying variables because it directly influences statistical analysis, measurement accuracy, and the interpretation of data. The classification of numbers into natural, whole, integer, rational, irrational, and real numbers helps in determining which mathematical operations and statistical techniques are valid for a given data set:
- Natural Numbers: \(0, 1, 2, 3, \ldots\)
- Integers: \(\dots , -3, -2, -1, 0, 1, 2, 3, \ldots\)
- Rational Numbers: Numbers that can be expressed as a fraction \(\frac{a}{b}\) , where \(a\) and \(b\) are integers. Examples include:
- \(-14 = \frac{-14}{1}\)
- \(\frac{3}{4} = 0.75\)
- \(\frac{2}{7} = 0.285714285714 \dots\)
- Real Numbers: Non-repeating decimal numbers, such as:
- \(\pi = 3.14159265358979 \dots\)
For example you cannot calculate an average zip code (nominal) or say that “a temperature of 20°C is twice as hot as 10°C” (interval), but you can say “a person earning €50,000 earns twice as much as someone earning €25,000” (ratio). An another example, you cannot consider the mean of a categorical variable like “favorite color,” but you can analyze the frequency distribution.