Experimental Design

Principles of Experimental Design

Experimental design provides structured approaches to data collection that enable valid causal inference. Unlike observational studies where variables cannot be controlled, experiments actively manipulate treatments and observe effects. The design determines what can be concluded from the experiment.

Three key principles guide experimental design: randomization, replication, and blocking. Randomization controls for confounding by ensuring treatment groups are comparable on average. Replication provides estimates of variability and improves precision. Blocking accounts for known sources of variation to reduce error.

Understanding experimental design enables efficient studies that answer questions of interest while managing resource constraints. Good design prevents problems that cannot be fixed in analysis.

Completely Randomized Designs

The simplest experimental design assigns treatments completely at random to experimental units. No blocking or matching is used.

Random Assignment

Random assignment ensures that treatment groups are comparable on average before treatment. Any differences that develop can be attributed to treatment rather than pre-existing differences.

Simple randomization assigns treatments with equal probability. This is straightforward but might create imbalance by chance, especially with small samples.

Randomization can use random number tables, computer-generated random numbers, or other chance mechanisms. The key is that assignment probabilities are known and equal.

Analysis

Analysis of variance (ANOVA) tests for treatment differences. The design creates a single factor with multiple levels (treatments). The null hypothesis states all treatment means are equal.

Post-hoc comparisons identify specific group differences when the overall test is significant. Various procedures (Tukey, Bonferroni) control family-wise error rates.

Confidence intervals for differences between treatment means quantify effects with uncertainty.

Randomized Block Designs

Blocking groups similar experimental units together, reducing variability within blocks. This increases precision for detecting treatment effects.

Block Formation

Blocks should be homogeneous internally and different externally. Similar experimental units go in the same block; different units go in different blocks.

Common blocking factors include time (blocks are time periods), space (blocks are locations), or characteristics (blocks are similar subjects).

The number of blocks equals the number of replications of each treatment within each block when using a randomized complete block design.

Analysis

The ANOVA for randomized block designs includes blocks as well as treatments. The block effect is tested against within-block variability. This is different from treating blocks as a fixed factor.

If blocks represent a random sample from a larger population, the block effect should be treated as random. This affects expected mean squares and hypothesis tests.

Interaction between blocking factor and treatment can be examined in factorial designs. Significant interaction complicates interpretation.

Factorial Designs

Factorial designs examine multiple factors simultaneously, allowing examination of both main effects and interactions.

Main Effects and Interactions

Main effects show how changing one factor affects the outcome, averaging over levels of other factors. They capture the average effect of each factor.

Interactions show how the effect of one factor depends on the level of another factor. Significant interaction means simple effects differ across levels.

Understanding both main effects and interactions provides complete picture of how factors combine to affect outcomes.

Design Structure

A factorial design with a factors, each at b levels, has a×b treatment combinations. Each combination is replicated some number of times.

Two-level factorial designs are common and efficient. They can screen many factors efficiently. The term "2k factorial" refers to k factors at 2 levels.

Full factorial designs include all possible combinations. Fractional factorial designs include only a subset, sacrificing ability to estimate some interactions but reducing run size.

Factorial ANOVA

Factorial ANOVA analyzes experiments with multiple factors. This extends one-way ANOVA to handle more complex designs.

Factorial Model

The model includes main effects for each factor and interactions. The two-way model is Yijk = μ + αi + βj + (αβ)ij + εijk, where i indexes factor A, j indexes factor B, and k indexes replications.

Higher-order interactions (three-way, four-way) are possible but become complex to interpret. Practical designs usually focus on lower-order interactions.

The analysis partitions variance into components for each effect. Mean squares for each effect compare between-group variance to within-group (error) variance.

Interaction Plots

Interaction plots display means at factor combinations. Parallel lines indicate no interaction; non-parallel lines indicate interaction.

When interaction is significant, main effects might be misleading. Interpretation should focus on simple effects (effect of one factor at specific levels of the other).

Latin Square Designs

Latin square designs control for two sources of variation simultaneously. They use square grids where each treatment appears exactly once in each row and column.

Design Structure

A k×k Latin square has k rows, k columns, and k treatments. Each treatment appears exactly once in each row and once in each column.

This controls for variation in two perpendicular directions. Common applications include experiments with time periods and locations, or comparing products across different testers and conditions.

Balancing ensures no treatment appears in any row-column combination more than once.

Analysis

Analysis is similar to randomized block designs but with two blocking factors. The ANOVA includes row, column, and treatment effects.

Efficiency comes from controlling two sources of variation. The design reduces error variance compared to designs without blocking.

Split-Plot Designs

Split-plot designs have hierarchy in experimental units. Some factors are applied to larger units (whole plots), while others are applied to smaller units (subplots).

Design Structure

Whole plots receive one set of treatments. Subplots within whole plots receive another set. This creates nested structure.

The design is efficient when certain factors require large experimental units. It is common in agricultural experiments and some industrial settings.

Hard-to-change factors become whole-plot factors; easy-to-change factors become subplot factors.

Analysis

The ANOVA includes both whole-plot and subplot sources. Error terms differ for whole-plot and subplot factors.

The analysis must respect the design structure. Using the wrong error term inflates or deflates Type I error rates.

Response Surface Methodology

Response surface methods model relationships between several quantitative factors and a response. They are used for optimization.

Central Composite Designs

Central composite designs include factorial points (at ±1 levels), axial points (at ±α levels), and center points. This allows fitting quadratic (curved) models.

The design enables fitting the full quadratic model with reasonable number of runs. α is chosen to provide orthogonality or rotatability.

These designs are widely used in industrial experimentation for process optimization.

Optimization

Once a response surface model is fit, optimization finds factor levels that maximize or minimize the response. This can be done analytically for simple models or numerically for complex ones.

Contour plots visualize the response surface, showing where response is highest. This helps understand the shape of the response surface.

Ridge analysis finds optimal points on ridges where response is constrained. This is useful when moving toward the optimum is impractical.

Power and Sample Size

Power analysis determines the probability of detecting effects of practical importance. It guides sample size selection.

Power Concepts

Power equals the probability of correctly rejecting a false null hypothesis. It depends on effect size, sample size, significance level, and variability.

Type I error (false positive) is controlled by the significance level. Type II error (false negative) relates to power. The goal is adequate power (typically 0.80 or 0.90).

Small effects require larger samples to detect than large effects. Greater power requires larger samples.

Sample Size Planning

Sample size planning uses power analysis to determine required sample sizes. Input includes desired power, significance level, and expected effect size.

Effect size estimates come from pilot data, previous studies, or practical significance considerations. Conservative (smaller) effect sizes require larger samples.

Different designs require different sample size calculations. Many software packages implement power analysis for common designs.

Causal Inference

Causal inference attempts to determine whether treatments cause outcomes. This is more challenging than simple group comparison.

Potential Outcomes Framework

The potential outcomes framework formalizes causal effects. Each unit has potential outcomes under each treatment. The causal effect is the difference between these potential outcomes.

We can only observe one potential outcome per unit (the treatment actually received). This is the "fundamental problem of causal inference."

Counterfactuals represent what would have happened under different treatment. These are not observed but are needed for causal effects.

Design-Based Inference

Randomization provides the gold standard for causal inference. Under random assignment, group differences estimate causal effects.

Instrumental variables, regression discontinuity, and other designs provide partial identification in observational settings. They require additional assumptions to interpret causally.

Sensitivity analysis examines how robust conclusions are to assumption violations. This is important for observational studies.

Key Takeaways

Randomization, replication, and blocking are fundamental experimental design principles
Completely randomized designs are simplest but might be inefficient with known variation sources
Factorial designs examine multiple factors and their interactions simultaneously
Blocking accounts for known sources of variation to improve precision
Power analysis guides sample size selection to detect effects of practical importance
Strong causal claims require designs that ensure treatment groups are comparable

Principles of Experimental Design

Completely Randomized Designs

Random Assignment

Analysis

Randomized Block Designs

Block Formation

Analysis

Factorial Designs

Main Effects and Interactions

Design Structure

Factorial ANOVA

Factorial Model

Interaction Plots

Latin Square Designs

Design Structure

Analysis

Split-Plot Designs

Design Structure

Analysis

Response Surface Methodology

Central Composite Designs

Optimization

Power and Sample Size

Power Concepts

Sample Size Planning

Causal Inference

Potential Outcomes Framework

Design-Based Inference

Key Takeaways

Need Expert Data Science Help?