Analysis of variance is an essential statistical tool in research. ANOVA helps to test the statistical significance of mean differences between two or more groups. By using ANOVA, researchers can determine if an observed difference between groups is due to chance or actual differences. However, ANOVA results can be improved by using orthogonal contrast. Orthogonal contrast helps to maximize the power of ANOVA by creating meaningful comparisons among groups. In this blog post, we will discuss how to maximize ANOVA results with orthogonal contrast in R.
Contents
Introduction
In statistics, an orthogonal contrast is a type of contrast that is designed to be independent of other contrasts in the same model. Orthogonal contrasts are useful because they allow you to test multiple hypotheses simultaneously without inflating the probability of a Type I error.
Orthogonal contrasts are constructed in such a way that the sum of the products of the contrast coefficients for any two contrasts is equal to zero. In other words, the contrasts are orthogonal to each other, and do not overlap in terms of the information they provide about the data.
Orthogonal polynomial contrasts are a type of linear contrast used in analysis of variance (ANOVA) to test for trends across levels of a categorical variable. In R, you can perform orthogonal polynomial contrast analysis using the contrast()
function.
What is orthogonal contrast
Orthogonal contrast is a statistical method that creates meaningful comparisons between groups. Orthogonal contrasts are a set of linear combinations of group means that are mutually exclusive, meaning that the sum of coefficients for each contrast equals zero. By using orthogonal contrast, we can test different hypotheses about group means.
Orthogonal polynomial contrasts
These contrasts are particularly useful when analyzing data that involves ordered categorical variables, such as Likert scales or educational levels. The contrasts allow for the detection of linear and quadratic trends in the data, which can provide valuable insights into the underlying patterns and relationships. With the contrast()
function in R, you can easily specify the type of orthogonal polynomial contrast you want to use, such as linear, quadratic, or cubic, and apply it to your data. This powerful tool can help you uncover hidden trends and patterns in your data, and make more informed decisions based on the results. So, if you’re looking to take your data analysis to the next level, consider using orthogonal polynomial contrasts with the contrast()
function in R.
An example of using orthogonal contrast in R
Loading iris data
The iris dataset is a commonly used dataset in machine learning and data analysis. It contains measurements of the sepal length, sepal width, petal length, and petal width for three different species of iris flowers (setosa, versicolor, and virginica), with 50 samples of each species.
In R, the iris dataset is included in the base installation, so we can load it directly without any additional packages. Here’s how to load and explore the iris dataset in R. We used head()
function to print the first six rows of the dataset.
data("iris")
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa
Fitting analysis of variance model
The lm()
function in R can be used for analysis of variance (ANOVA) by fitting a linear model to the data and then performing hypothesis tests on the coefficients of the model. Here’s an example code snippet that demonstrates how to use lm()
for ANOVA in R. In this example, We used the lm()
function to fit a linear model to the iris data, with “Sepal.Length” as the dependent variable and “Species” as the independent variable. Next we used anova()
function to print the analysis of variance table from the model object. The results are shown below:
model <- aov(Sepal.Length ~ Species, data = iris)
anova(model)
# Analysis of Variance Table # # Response: Sepal.Length # Df Sum Sq Mean Sq F value Pr(>F) # Species 2 63.212 31.606 119.26 < 2.2e-16 *** # Residuals 147 38.956 0.265 # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results showed that there is significant difference in iris species regarding the sepal length. Now we can further proceed to see differences in species using the orthogonal polynomial contrasts.
Performing othogonal polynomial contrast
With the contr.poly()
function in R, you can easily generate the appropriate contrast matrix for your data and then use it in conjunction with the lm()
function to fit a linear regression model. This allows you to not only test for trends but also to estimate the magnitude and direction of those trends, providing valuable insights into the underlying mechanisms driving your data.
model <- aov(Sepal.Length ~ Species,
data = iris,
# performing orthogonial polynormial contrast
contrasts = list(Species = contr.poly(3)))
The summary.aov()
function is used to obtain a summary of the results of an analysis of variance (ANOVA) model. The argument split in summary.aov()
specifies whether or not the summary output should be split by a given factor variable.
# Getting output with polynomial contrast in ANOVA table
summary.aov(model,
split = list(Species = list("Linear"=1,
"Quadratic" = 2)))
# Df Sum Sq Mean Sq F value Pr(>F) # Species 2 63.21 31.61 119.27 <2e-16 *** # Species: Linear 1 62.57 62.57 236.10 <2e-16 *** # Species: Quadratic 1 0.64 0.64 2.43 0.121 # Residuals 147 38.96 0.27 # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results below showed that only the linear trend is significant. However, it doesn’t make any sense as the levels of categorical variables are just three different species. We can’t say the sepal length is linearly associated with the species. This example was just to show how we can perform the orthogonal polynomial contrast. The linear trend would be more informative if the factor variable contains levels of some fertilizer dose. In this way our contrast would be meaningful and we can say with increase in level of fertilizer dose there is linear increase in sepal length.
Performing meaningful orthogonal contrasts
The contrasts()
function is used to specify the coding scheme for a categorical variable in a statistical model. Categorical variables are typically encoded as factors in R, and different types of contrasts can be used to represent the levels of the factor variable in the model.
The contrasts()
function takes two arguments: the name of the factor variable to be coded, and the type of contrast to be used. There are several types of contrasts that can be specified, including:
- Treatment contrasts: This is the default coding scheme used in R. It compares the mean of each level of the factor variable to the mean of a reference level.
- Sum contrasts: This coding scheme compares the mean of each level of the factor variable to the overall mean of the variable.
- Helmert contrasts: This coding scheme compares each level of the factor variable to the mean of the previous levels.
- Polynomial contrasts: This coding scheme represents the levels of the factor variable as orthogonal polynomials, which can be used to test for linear, quadratic, and higher-order trends in the data.
In this example we can make meaningful contrasts by comparing the species. For example we can compare to see differences in sepal length of setosa and versicolor. We can also compare versicolor and virginica species regarding differences in the sepal length. Let’s first we print the default contrasts by using the contrasts function in R.
contrasts(iris$Species)
# versicolor virginica # setosa 0 0 # versicolor 1 0 # virginica 0 1
The default contrasts will compare versicolor vs versicolor and virginica vs virginica. Also the sum of contrast coefficients is not equal to zero. This is not similar to our comparisons. We shall manually create a matrix of contrasts as shown below:
# setosa vs versicolor
c1 <- c(1, -1, 0)
# versicolor vs virginica
c2 <- c(0, 1, -1)
# combined contrasts into a matrix
mat.contrast <- cbind(c1,c2)
colnames(mat.contrast) <- c("setosa vs versicolor", "versicolor vs virginica")
# tell R that the matrix gives the contrasts you want
contrasts(iris$Species) <- mat.contrast
# Print the contrast matrix
attr(iris$Species, "contrasts")
# setosa vs versicolor versicolor vs virginica # setosa 1 0 # versicolor -1 1 # virginica 0 -1
Now let’s use this contrast matrix in aov()
model. Then we shall use summary.aov()
function to generate a summary of the results from an analysis of variance model. This function takes one argument as the fitted ANOVA model object and in split argument we shall split the analysis of variance results by a contrast comparisons we used in aov()
model.
model <- aov(Sepal.Length ~ Species,
contrasts = list(Species = mat.contrast),
data = iris)
summary.aov(model, split = list(Species = list("setosa vs versicolor" = 1,
"versicolor vs virginica" = 2)))
# Df Sum Sq Mean Sq F value Pr(>F) # Species 2 63.21 31.61 119.27 < 2e-16 *** # Species: setosa vs versicolor 1 21.62 21.62 81.59 8.77e-16 *** # Species: versicolor vs virginica 1 41.59 41.59 156.94 < 2e-16 *** # Residuals 147 38.96 0.27 # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can see both of the contrasts are highly significant. This shows that their is significant difference in sepal length of setosa and virginica species. we also found significant difference in sepal length of versicolor and virginica species.
Conclusion
Orthogonal contrast is a useful statistical tool that can help researchers to maximize their ANOVA results. By creating meaningful comparisons between groups, orthogonal contrast can enhance the power of ANOVA. In this blog post, we have discussed how to use orthogonal contrast in R. Researchers can use the techniques discussed in this blog post to improve their ANOVA results and draw meaningful conclusions from their data.
Download R program — Click_here
Download R studio — Click_here