4 Variable Classification

A variable is a property that can vary between different units in a population. Understanding the classification of variables is fundamental to selecting appropriate statistical methods and interpreting results accurately. The nature of a variable will for example determine:

The choice of measures of central tendency and dispersion
The choice of correlation measures
The selection of appropriate graphical representations

4.1 Types of Variables

To facilitate analysis, we focus on measurable variables, which are broadly categorized into quantitative and qualitative types. There are further sub-classes for each of these main classes, as shown in Figure 4.1 and exemplified further in the following.

graph TD;
    A(Variable) --> B(Qualitative);
    A(Variable) --> C(Quantitative);
    C --> D(Discrete);
    C --> E(Continuous);

Figure 4.1: Variable classification.

4.1.1 Quantitative (Numerical)

These variables are represented by numbers and can be measured. Depending on the type of numbers a variable takes, it can be classified as discrete or continuous.

Discrete variables take specific, distinct values and cannot be subdivided (natural, integer, or rational numbers). Think of these as variables holding countable values. Examples include number of children in a family, number of goals ina football match, and number of sales transactions per day.

Continuous variables can take any value within a given range of values within an interval and can be infinitely divided (real numbers). Examples include hegiht, weight, stock price and distance.

4.1.2 Qualitative (Categorical) Variables

These variables represent data that can be divided into distinct groups or categories. These values do not have a natural numerical order (except for ordinal variables) and must be coded into numerical values for statistical analysis. Examples include political affiliation, blood type, movie genre and social media platform.

A special case of qualitative variables that take only two possible values, so called dichotomous or binary variable. Examples include smoking status (Smoker/Non-smoker) and COVID-19 test result (positive/negative).

4.2 Levels of Measurement

The characteristics of collected data allow us to classify them into four different measurement levels. These levels determine the statistical operations that can be performed. Table 4.1 summarizes the different measurement levels described in the following.

Nominal scale categorizes data without any inherent order. For example, there is no inherent ordering in the variable eye color (blue, brown, green), gender or nationality. You can only distinguish between the values nominal variables take.

Ordinal scale is when data can be ranked in a meaningful order, but differences between values are not necessarily equal or meaningful, for example when looking at the variable fruit preference ranking or customer satisfaction levels (low, medium, high). In other words, interval lengths between one variable value and another are not of the same length.

Interval scale is similar to the ordinal scale but with equal intervals between values. However, it lacks a true zero point. Examples include Temperature in Celsius and calendar years (the year 2000 is 100 years after 1900, but the year 0 does not represent the “beginning of time”).

Finally, ratio scale has all of the above properties but also a natural zero point, allowing meaningful calculations of differences and ratios. Examples here include salary, distance traveled, height, and weight.

Table 4.1: Summary of measurement levels.

Summary of measurement levels.
	Distinguish	Rank	Equal Step Length	Absolute Zero Point	Example
Nominal	Yes	No	No	No	gender, city, religion
Ordinal	Yes	Yes	No	No	grades, preference, customer satisfaction ratings
Interval	Yes	Yes	Yes	No	temperature in Celsius. credit scores, calender years
Ratio	Yes	Yes	Yes	Yes	length, weight, time, salary

4.3 Types of Numbers

Understanding the nature of numbers is crucial when classifying variables because it directly influences statistical analysis, measurement accuracy, and the interpretation of data. The classification of numbers into natural, whole, integer, rational, irrational, and real numbers helps in determining which mathematical operations and statistical techniques are valid for a given data set:

Natural Numbers: \(0, 1, 2, 3, \ldots\)
Integers: \(\dots , -3, -2, -1, 0, 1, 2, 3, \ldots\)
Rational Numbers: Numbers that can be expressed as a fraction \(\frac{a}{b}\) , where \(a\) and \(b\) are integers. Examples include:
- \(-14 = \frac{-14}{1}\)
- \(\frac{3}{4} = 0.75\)
- \(\frac{2}{7} = 0.285714285714 \dots\)
Real Numbers: Non-repeating decimal numbers, such as:
- \(\pi = 3.14159265358979 \dots\)

For example you cannot calculate an average zip code (nominal) or say that “a temperature of 20°C is twice as hot as 10°C” (interval), but you can say “a person earning €50,000 earns twice as much as someone earning €25,000” (ratio). An another example, you cannot consider the mean of a categorical variable like “favorite color,” but you can analyze the frequency distribution.