AGRON INFO TECH

How to Conduct a Split Plot Analysis in R

Description of design

Split plot analysis is a statistical method used to analyze data that arise from experiments where treatments are applied to subunits of the main experimental units. In a split plot design, the experimental units are divided into two or more subunits or blocks, where each subunit or block receives a different treatment. The treatment applied to the main experimental units is called the whole plot treatment, while the treatment applied to the subunits or blocks is called the subplot treatment.

In agricultural, biology, and engineering research, where it is frequently impracticable or impossible to apply treatments to the full experimental unit simultaneously, split plot designs are frequently used. Experiments that examine the effects of various fertilisers on crops, the effects of various dosages of a drug on various people, or the impact of various production methods on the calibre of goods are a few examples of split plot designs.

Split plot analysis entails estimating the effects of the whole plot treatment, the subplot treatment, and their interactions using statistical approaches. The study can also reveal whether the subplot treatment has an impact on the overall plot treatment’s outcome and whether this impact is constant at all levels of the main plot treatment.

Exammple dataset

For example, suppose a researcher wants to study the effect of two different fertilizers methods on plant growth. The main plots would be different fields, and the subplots would be different sections of each field. One fertilizer method would be applied to one set of fields, and the other fertilizer method would be applied to the other set of fields. Within each field, different sections would receive different amounts of fertilizer.

Factor A (mainplot): Fertilizer methods

  • F1 = Broadcasting
  • F2 = Fertigation

Factor B (subplot): Fertilizer dose

  • T1 = Control (0 kg per acre)
  • T2 = 25 kg N per acre
  • T3 = 50 kg N per acre
  • T4 = 75 kg N per acre
# Creating data
library(dplyr)
set.seed(123)

Control <- rnorm(n = 6, mean = 15, sd = 3.4)
T25kg <- rnorm(n = 6, mean = 22, sd = 5.5)
T50kg <- rnorm(n = 6, mean = 45, sd = 9.5)
T75kg <- rnorm(n = 6, mean = 35, sd = 8.2)

yield <- as.data.frame(cbind(Control, T25kg, T50kg, T75kg)) 
yield$Method <- rep(c("Broadcasting", "Fertigation"), each = 3)
yield$Rep <- rep(c(1:3), times = 2)

df <- yield %>% tidyr::pivot_longer(!c(Rep, Method), 
                                    names_to = "Dose", 
                                    values_to = "yield")
df <- as.data.frame(df)
df
#          Method Rep    Dose    yield
# 1  Broadcasting   1 Control 13.09438
# 2  Broadcasting   1   T25kg 24.53504
# 3  Broadcasting   1   T50kg 48.80733
# 4  Broadcasting   1   T75kg 40.75112
# 5  Broadcasting   2 Control 14.21740
# 6  Broadcasting   2   T25kg 15.04216
# 7  Broadcasting   2   T50kg 46.05149
# 8  Broadcasting   2   T75kg 31.12311
# 9  Broadcasting   3 Control 20.29961
# 10 Broadcasting   3   T25kg 18.22231
# 11 Broadcasting   3   T50kg 39.71951
# 12 Broadcasting   3   T75kg 26.24385
# 13  Fertigation   1 Control 15.23973
# 14  Fertigation   1   T25kg 19.54886
# 15  Fertigation   1   T50kg 61.97567
# 16  Fertigation   1   T75kg 33.21261
# 17  Fertigation   2 Control 15.43958
# 18  Fertigation   2   T25kg 28.73245
# 19  Fertigation   2   T50kg 49.72958
# 20  Fertigation   2   T75kg 26.58676
# 21  Fertigation   3 Control 20.83122
# 22  Fertigation   3   T25kg 23.97898
# 23  Fertigation   3   T50kg 26.31714
# 24  Fertigation   3   T75kg 29.02309

Variable structure

The str() function in R is a built-in function that provides information about the structure of an R object. The name “str” stands for “structure”. It is commonly used to get a quick overview of the type, dimensions, and content of an R object, especially when working with unfamiliar data.

The object df is tibble of 24 observations and 3 variables. The first two variables are not in proper format and we shall change their structure to factors.

str(df)
# 'data.frame': 24 obs. of  4 variables:
#  $ Method: chr  "Broadcasting" "Broadcasting" "Broadcasting" "Broadcasting" ...
#  $ Rep   : int  1 1 1 1 2 2 2 2 3 3 ...
#  $ Dose  : chr  "Control" "T25kg" "T50kg" "T75kg" ...
#  $ yield : num  13.1 24.5 48.8 40.8 14.2 ...

We shall change the structure of the first two variables Methods and Dose to factor by using as.factor function for that specific variable.

# converting variables to factors
df$Method <- as.factor(df$Method)
df$Dose <- as.factor(df$Dose)
str(df)
# 'data.frame': 24 obs. of  4 variables:
#  $ Method: Factor w/ 2 levels "Broadcasting",..: 1 1 1 1 1 1 1 1 1 1 ...
#  $ Rep   : int  1 1 1 1 2 2 2 2 3 3 ...
#  $ Dose  : Factor w/ 4 levels "Control","T25kg",..: 1 2 3 4 1 2 3 4 1 2 ...
#  $ yield : num  13.1 24.5 48.8 40.8 14.2 ...

The attach() function in R is used to attach a data frame to the search path, so that the variables in the data frame can be accessed by their names alone. Once a data frame is attached, its variables can be referred to directly by their names, without having to use the $ operator to specify the data frame name. This can save some typing and make code easier to read.

# Masking components of the data
attach(df)
# The following object is masked _by_ .GlobalEnv:
# 
#     yield

Fitting ANOVA model for split plot design

Analysis of variance (ANOVA), mixed-effects models, and generalised linear models are a few frequent statistical methods used in split plot analysis. The effectiveness of the various treatments can be estimated using these methods, and the significance of the detected differences can be evaluated.

In this example we shall utilize the ANOVA model to see the effectiveness of fertilizer application methods and their different dose levels on the yield of rice crop. We shall use sp.plot() function from agricolae package.

library(agricolae)
# Fitting ANOVA model for split plot design
model <- with(df, 
              sp.plot(block = Rep,  
                      pplot = Method, 
                      splot = Dose, 
                      Y = yield)
) 
# 
# ANALYSIS SPLIT PLOT:  yield 
# Class level information
# 
# Method    :  Broadcasting Fertigation 
# Dose  :  Control T25kg T50kg T75kg 
# Rep   :  1 2 3 
# 
# Number of observations:  24 
# 
# Analysis of Variance Table
# 
# Response: yield
#             Df  Sum Sq Mean Sq F value    Pr(>F)    
# Rep          2  173.77   86.89  8.0855 0.1100656    
# Method       1    6.52    6.52  0.6067 0.5175765    
# Ea           2   21.49   10.75     NaN       NaN    
# Dose         3 2902.30  967.43 14.9754 0.0002329 ***
# Method:Dose  3   47.25   15.75  0.2438 0.8641451    
# Eb          12  775.22   64.60     NaN       NaN    
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# cv(a) = 11.4 %, cv(b) = 28 %, Mean = 28.69679

Another way to get the same results is the use of aov() function from stats package. We can define formula as response variable is separated by blocks plus main plot factor plus subplot factor plus Error term for the block and main plot factor.

# Converting block variable to factor
df$Rep <- as.factor(df$Rep)
# Fitting ANOVA model for split plot design
model2 <- aov(yield ~ Rep + Method * Dose + Error(Rep/Method),
              data = df)
summary(model2)
# 
# Error: Rep
#     Df Sum Sq Mean Sq
# Rep  2  173.8   86.89
# 
# Error: Rep:Method
#           Df Sum Sq Mean Sq F value Pr(>F)
# Method     1  6.519   6.519   0.607  0.518
# Residuals  2 21.492  10.746               
# 
# Error: Within
#             Df Sum Sq Mean Sq F value   Pr(>F)    
# Dose         3 2902.3   967.4  14.975 0.000233 ***
# Method:Dose  3   47.2    15.7   0.244 0.864145    
# Residuals   12  775.2    64.6                     
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The output of the model showed that only fertilizers application dose variable showed highly significant effect on the rice yield with probability value lower than 0.01. So next we shall proceed with mean comparison test to see differences in the mean values of variable Dose.

Least Significant Difference test

Extracting error df and error MS from the model

Following codes shows how we can extract the values for error degree of freedom and error mean square from both models as shown below.

Extracting error df and error MS from first model.

# Getting Edf and EMS from sp.plot model
# Error df for main plot factor (Method)
Edfa <- model$gl.a
Edfa
# Error df for subplot factor (Dose)
Edfb <- model$gl.b
Edfb
# Error MS for main plot factor (Method)
EMSa <- model$Ea
EMSa
# Error MS for subplot factor (Dose)
EMSb <- model$Eb
EMSb
# [1] 2
# [1] 12
# [1] 10.74608
# [1] 64.60155

Extracting error df and error MS from second model.

# Getting Edf and EMS from aov() model
# Error df for main plot factor (Method)
Edfa <- model2$`Rep:Method`$df.residual
Edfa
# Error df for subplot factor (Dose)
Edfb <- model2$Within$df.residual
Edfb
# Error MS for main plot factor (Method)
EMSa <- summary(model2$`Rep:Method`)[[1]][[3]][2] 
EMSa
# Error MS for subplot factor (Dose)
EMSb <- summary(model2$Within)[[1]][[3]][3] 
EMSb
# [1] 2
# [1] 12
# [1] 10.74608
# [1] 64.60155

LSD test

LSD.test() is a function in the R package Agricolae that is used to perform the Least Significant Difference (LSD). The LSD test is a post-hoc test that is used to compare the means of all pairs of treatment groups in an ANOVA, to determine if they are significantly different from each other. If main plot factor show significant response in ANOVA table then Edfa and EMSa should be used in LSD.test function. When subplot factor or interaction term show significant response then Edfb and EMSb should be used in the LSD.test function. As in our case subplot factor has significant effect on yield so we shall use Edfb and EMSb in LSD test as shown in below command.

LSD <- with(df, 
            LSD.test(
                      y = yield, 
                      trt = Dose,
                      DFerror = Edfb, 
                      MSerror = EMSb,
                      alpha = 0.05,
                      group = TRUE, 
                      console = TRUE
            )
            
) 
# 
# Study: yield ~ Dose
# 
# LSD t Test for yield 
# 
# Mean Square Error:  64.60155 
# 
# Dose,  means and individual ( 95 %) CI
# 
#            yield       std r       LCL      UCL      Min      Max
# Control 16.52032  3.247008 6  9.370982 23.66966 13.09438 20.83122
# T25kg   21.67663  4.974571 6 14.527296 28.82597 15.04216 28.73245
# T50kg   45.43345 11.848624 6 38.284115 52.58279 26.31714 61.97567
# T75kg   31.15676  5.400990 6 24.007419 38.30609 26.24385 40.75112
# 
# Alpha: 0.05 ; DF Error: 12
# Critical Value of t: 2.178813 
# 
# least Significant Difference: 10.11069 
# 
# Treatments with the same letter are not significantly different.
# 
#            yield groups
# T50kg   45.43345      a
# T75kg   31.15676      b
# T25kg   21.67663     bc
# Control 16.52032      c

Interpretting LSD output

The output from LSD test showed the following information:

  • The LSD values for all pairwise comparisons between the group means.
  • The results of the LSD test for each pairwise comparison, including mean, SD, r, lower and upper confidence limits and minimum and maximum values.
  • The group means for each level of the factor variable along with lettering.

Maximum yield was recorded in plots where nitrogen was applied at the rate of 50 kg per acre followed by 75 kg while the least production was recorded in control where nitrogen was not applied.

If you want to know how to visualize results using barplot showing standard error and lettering on top of each bar then visit this blog (Easy way to create a barplot showing standard error and lettering in R).


Download R program — Click_here

Download R studio — Click_here

1 thought on “How to Conduct a Split Plot Analysis in R”

  1. Pingback: Split plot analysis in R – AGRON INFO TECH

Comments are closed.