Types of Data in Statistics — Quantitative vs Qualitative

Foundations of StatisticsData TypesFree Lesson

Advertisement

Types of Data in Statistics

Understanding data types is the first and most critical step in any statistical analysis. The type of data you have determines which statistical methods are valid, which visualizations are appropriate, and what conclusions you can draw.


The Data Type Hierarchy

ALL DATA
├── Qualitative (Categorical)
│   ├── Nominal  (categories with no order)
│   └── Ordinal  (categories with meaningful order)
│
└── Quantitative (Numerical)
    ├── Discrete (countable, whole numbers)
    └── Continuous (measurable, any value in a range)

Qualitative (Categorical) Data

Qualitative data represents categories or groups — things that are described rather than measured numerically.

Nominal Data

Categories with no natural order. You can only determine equality or inequality.

Examples:

  • Eye color: brown, blue, green, hazel
  • Blood type: A, B, AB, O
  • Country of birth
  • Product category: electronics, clothing, food
  • Survey responses: Yes / No

Valid operations: Count, mode, chi-square test
Invalid operations: Mean, median, subtraction

Ordinal Data

Categories with a meaningful order, but the gaps between categories are not necessarily equal.

Examples:

  • Education level: high school < bachelor's < master's < PhD
  • Customer satisfaction: Poor < Fair < Good < Excellent
  • Military rank: Private < Corporal < Sergeant < Captain
  • Star ratings: ★ < ★★ < ★★★ < ★★★★ < ★★★★★

Valid operations: Ordering, median, percentiles, Spearman correlation
Invalid operations: Arithmetic mean (controversial), subtraction (intervals unknown)

Key distinction: "Excellent" is better than "Good", but is it exactly twice as good? Ordinal scales can't tell us.


Quantitative (Numerical) Data

Quantitative data represents measured or counted quantities — numbers that have mathematical meaning.

Discrete Data

Can only take specific, countable values — usually whole numbers. There are gaps between possible values.

Examples:

  • Number of children in a family (0, 1, 2, 3, ... — not 1.7)
  • Number of cars in a parking lot
  • Number of defects in a product
  • Shoe sizes (though not whole numbers, they're discrete: 8, 8.5, 9...)
  • Number of goals scored in a soccer match

Valid operations: All arithmetic, count, Poisson distribution, binomial distribution

Continuous Data

Can take any value within a range, including fractions and decimals. Limited only by measurement precision.

Examples:

  • Height (1.753847... meters)
  • Temperature (23.7°C)
  • Time to complete a task
  • Weight, blood pressure, distance
  • Stock prices

Valid operations: All arithmetic, normal distribution, integration, derivatives


Interval vs Ratio (A Deeper Cut)

Within quantitative data, we can further distinguish:

FeatureIntervalRatio
Equal intervals✅ Yes✅ Yes
True zero (zero = absence)❌ No✅ Yes
Meaningful ratios❌ No✅ Yes
ExampleTemperature (°C), IQHeight, weight, income

Interval example: 0°C is not "no temperature." 40°C is not twice as hot as 20°C (in the thermodynamic sense). Temperature in Kelvin is ratio.

Ratio example: A person who weighs 80 kg is genuinely twice as heavy as someone who weighs 40 kg.


Why Data Types Matter for Statistics

Analysis GoalNominalOrdinalDiscrete/Continuous
Central tendencyModeMode, MedianMean, Median, Mode
SpreadFrequencyIQRStd Dev, Variance
CorrelationCramér's VSpearman ρPearson r
Group comparisonChi-squareKruskal-WallisANOVA, t-test
RegressionDummy variablesOrdinal logisticLinear regression
VisualizationBar chartBar/boxHistogram, scatter

Python: Identifying and Working with Data Types

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load a rich dataset
df = sns.load_dataset('tips')
print("Dataset shape:", df.shape)
print("\nData types (pandas dtypes):")
print(df.dtypes)

Output:

Dataset shape: (244, 7)

Data types (pandas dtypes):
total_bill     float64   ← Continuous quantitative
tip            float64   ← Continuous quantitative
sex           category   ← Nominal qualitative
smoker        category   ← Nominal qualitative
day           category   ← Ordinal qualitative (Sun > Sat > Fri > Thur semantically)
time          category   ← Nominal qualitative
size            int64    ← Discrete quantitative
# --- Statistical summaries differ by type ---

print("\n=== Quantitative Variables ===")
print(df[['total_bill', 'tip', 'size']].describe())

print("\n=== Qualitative Variables ===")
for col in ['sex', 'smoker', 'day', 'time']:
    print(f"\n{col} — value counts:")
    print(df[col].value_counts())
    print(f"Mode: {df[col].mode()[0]}")

# --- Visualizations appropriate to each type ---
fig, axes = plt.subplots(2, 3, figsize=(14, 8))

# Continuous: histogram
axes[0, 0].hist(df['total_bill'], bins=20, color='steelblue', edgecolor='black', alpha=0.7)
axes[0, 0].set_title('Total Bill (Continuous)\n→ Histogram')
axes[0, 0].set_xlabel('Amount ($)')

# Discrete: bar chart
size_counts = df['size'].value_counts().sort_index()
axes[0, 1].bar(size_counts.index, size_counts.values, color='coral', edgecolor='black')
axes[0, 1].set_title('Party Size (Discrete)\n→ Bar Chart')
axes[0, 1].set_xlabel('Size')

# Nominal: pie chart
sex_counts = df['sex'].value_counts()
axes[0, 2].pie(sex_counts.values, labels=sex_counts.index, autopct='%1.1f%%', startangle=90)
axes[0, 2].set_title('Sex (Nominal)\n→ Pie Chart')

# Ordinal: ordered bar
day_order = ['Thur', 'Fri', 'Sat', 'Sun']
day_counts = df['day'].value_counts().reindex(day_order)
axes[1, 0].bar(day_counts.index, day_counts.values, color='mediumseagreen', edgecolor='black')
axes[1, 0].set_title('Day (Ordinal)\n→ Ordered Bar Chart')

# Continuous: box plot by category
df.boxplot(column='tip', by='day', ax=axes[1, 1])
axes[1, 1].set_title('Tip by Day\n→ Box Plot')

# Scatter: two continuous
axes[1, 2].scatter(df['total_bill'], df['tip'], alpha=0.5, color='purple')
axes[1, 2].set_title('Bill vs Tip (Continuous × Continuous)\n→ Scatter Plot')
axes[1, 2].set_xlabel('Total Bill ($)')
axes[1, 2].set_ylabel('Tip ($)')

plt.tight_layout()
plt.savefig('data_types_visualization.png', dpi=150)
plt.show()

Data Type Classification in Practice

def classify_variable(series: pd.Series, nunique_threshold: int = 15) -> str:
    """Classify a pandas Series into a statistical data type."""
    dtype = series.dtype
    nunique = series.nunique()

    if dtype == 'bool':
        return 'Nominal (Binary)'
    elif dtype.name == 'category' or dtype == 'object':
        return 'Nominal Categorical'
    elif dtype in ['int32', 'int64']:
        if nunique <= nunique_threshold:
            return f'Discrete Quantitative ({nunique} unique values)'
        else:
            return 'Discrete Quantitative (high cardinality)'
    elif dtype in ['float32', 'float64']:
        return 'Continuous Quantitative'
    else:
        return f'Unknown ({dtype})'

# Apply to the tips dataset
print("Variable Classification:")
print("-" * 50)
for col in df.columns:
    classification = classify_variable(df[col])
    print(f"{col:<15} → {classification}")

Common Mistakes

❌ Treating Ordinal as Interval

Averaging Likert-scale responses (1–5) as if they are interval data is common but technically incorrect. The difference between "Strongly Agree" and "Agree" may not equal the difference between "Neutral" and "Disagree."

❌ Zip Codes as Quantitative

ZIP code 90210 is not 40,000 more than ZIP code 50000. It's a nominal identifier.

❌ Treating Discrete Data as Continuous (or vice versa)

Modeling number of children with a continuous distribution can predict 1.7 children — meaningless. Use Poisson or negative binomial.


Practice Exercises

Exercise 1: Classify each variable:

  • a) Temperature in Fahrenheit
  • b) Movie genre (Action, Comedy, Drama)
  • c) Customer age
  • d) Job satisfaction rating (1 = Very Unsatisfied, 5 = Very Satisfied)
  • e) Number of siblings

Exercise 2: For each variable in the iris dataset, identify the type and choose the most appropriate visualization.

import seaborn as sns
iris = sns.load_dataset('iris')
print(iris.dtypes)
# Your classifications and visualizations here
See Solution
# sepal_length: float64 → Continuous → histogram or box plot
# sepal_width: float64 → Continuous → histogram or box plot
# petal_length: float64 → Continuous → histogram or box plot
# petal_width: float64 → Continuous → histogram or box plot
# species: object/category → Nominal → bar chart

import seaborn as sns
import matplotlib.pyplot as plt

iris = sns.load_dataset('iris')
fig, axes = plt.subplots(1, 2, figsize=(10, 4))

# Continuous: petal_length distribution by species
for species in iris['species'].unique():
    subset = iris[iris['species'] == species]['petal_length']
    axes[0].hist(subset, bins=15, alpha=0.6, label=species)
axes[0].set_title('Petal Length by Species\n(Continuous, grouped)')
axes[0].legend()

# Nominal: species counts
iris['species'].value_counts().plot(kind='bar', ax=axes[1], color='coral', edgecolor='black')
axes[1].set_title('Species Count\n(Nominal)')
axes[1].tick_params(rotation=0)

plt.tight_layout()
plt.show()

Key Takeaways

  1. Data type determines your entire analysis pipeline — identify types before doing anything else.
  2. Qualitative data describes categories — nominal has no order, ordinal has meaningful order.
  3. Quantitative data measures or counts — discrete takes whole values, continuous takes any value.
  4. The interval/ratio distinction matters for ratio calculations and distributional choices.
  5. Pandas dtypes approximate statistical types but require human judgment to classify correctly.
  6. Wrong type → wrong method → wrong conclusion — this is not just academic pedantry.

Next Steps

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement