AGRON INFO TECH

Simple way to create Barplot in R using Graphics Package

One of the most popular types of graphics is the barplot (also known as barchart). This chart is used to show the relationship between a numerical and a categorical variable in the dataset. In this post we shall create a barplot using barplot function from graphics package in R. We shall learn how to use various arguments from this function while creating barplot.

Creating dataset

We shall create a dummy data using rnorm function to generate a vector of 10 values for each level of factor variable viz. NP, HP, OP. The vectors were then combined using cbind function as a data frame. We used pivot_longer function from tidyr package to convert this data frame to a longer form. This dataset is stored as an object df.

library(dplyr)
set.seed(123)
HP <- rnorm(n = 10, mean = 13, sd = 2.5)
NP <- rnorm(n = 10, mean = 23, sd = 7.5)
OP <- rnorm(n = 10, mean = 43, sd = 12.5)

yield <- as.data.frame(cbind(HP, NP, OP)) 

df <- yield %>% tidyr::pivot_longer(everything(), 
                                    names_to = "priming", 
                                    values_to = "yield")
df
# # A tibble: 30 x 2
#    priming yield
#    <chr>   <dbl>
#  1 HP       11.6
#  2 NP       32.2
#  3 OP       29.7
#  4 HP       12.4
#  5 NP       25.7
#  6 OP       40.3
#  7 HP       16.9
#  8 NP       26.0
#  9 OP       30.2
# 10 HP       13.2
# # … with 20 more rows

The structure of the variables is pretty fine however the variable priming is a factor variable and is being read as character. This may influence analysis results but for graphical representation we may proceed with character structure for factor variable.

str(df)
# tibble [30 × 2] (S3: tbl_df/tbl/data.frame)
#  $ priming: chr [1:30] "HP" "NP" "OP" "HP" ...
#  $ yield  : num [1:30] 11.6 32.2 29.7 12.4 25.7 ...

Calculating mean values for each factor level

The mean values were computed for each level of factor variable using the aggregate function from stats package in R and stored in an object df2.

df2 <- aggregate(x = yield ~ priming, data = df, FUN = mean)
df2
#   priming    yield
# 1      HP 13.18656
# 2      NP 24.56466
# 3      OP 37.69301

Visualizing variables in a barplot

The graphics package, also known as the base graphics system, offers a comprehensive set of methods for constructing a wide range of plots as well as capabilities for fine-tuning such plots.

In the barplot function specify the formula argument as in this case yield is separated by priming. The tilde (~) is used for separation between response and factor variables. There are a lot of arguments that control this barplot function and you can access these in help section by simply typing ?barplot or help(barplot) in R console.

barplot(
          # Specifying formula
          formula = yield ~ priming,
          # Dataset
          data = df2
)

Adding title, X and Y axis labels

To specify title, X axis and Y axis labels in barplot function we can use arguments main, xlab and ylab, respectively.

barplot(
          formula = yield ~ priming, 
          data = df2, 
          # Adding main title
          main = "Barplot showing yield for each level of priming", 
          # Specifying X and Y axis title
          xlab = "Seed priming", 
          ylab = "Yield"
)

Changing color

You can use custom colors to highlight each factor level by using col argument.

barplot(
          formula = yield ~ priming, 
          data = df2, 
          main = "Barplot showing yield for each level of priming", 
          xlab = "Seed priming", 
          ylab = "Yield", 
          # Specifying colors for the bars 
          col = c("#cc1489", "#0f993d", "#1262b3")
)

Adding legends

Legends can be added to plots by using legend function. You can specify position of the legend, labels for each factor level and assign colors. You can specify position of the legend using top, left, right, bottom and their combinations.

barplot(formula = yield ~ priming, 
        data = df2, 
        main = "Barplot showing yield for each level of priming", 
        xlab = "Seed priming", 
        ylab = "Yield", 
        col = c("#cc1489", "#0f993d", "#1262b3"))

legend(
          # Setting position of the legend
          "topleft",
          # Factor levels for the legend
          c("HP","NP", "OP"),
          # Fill color of the legend
          fill = c("#cc1489", "#0f993d", "#1262b3")
)

You can also specify the position of the legend by using the X and Y axis coordinates. Further you can set the size of the legend text by using cex argument in legend function.

barplot(formula = yield ~ priming, 
        data = df2, 
        main = "Barplot showing yield for each level of priming", 
        xlab = "Seed priming", 
        ylab = "Yield", 
        col = c("#cc1489", "#0f993d", "#1262b3"))

legend(
          # X and Y coordinates
          x = 0.2, y = 34,
          c("HP","NP", "OP"),
          fill = c("#cc1489", "#0f993d", "#1262b3"), 
          # Control size of the legend text
          cex = 1.3
)

Scaling axis

The base graphics system’s automatically produced axes will typically be sufficient for a plot. Even if the data being plotted on an axis are not numerical, this is still true. In our example we shall not customize limits for X axis as the variable is categorical and default limits will be sufficient. However, we can change the y axis limits by using ylim argument in barplot function.

# Creating barplot
barplot(formula = yield ~ priming, 
        data = df2, 
        main = "Barplot showing yield for each level of priming", 
        xlab = "Seed priming", 
        ylab = "Yield", 
        col = c("#cc1489", "#0f993d", "#1262b3"), 
        # Customizing Y axis limits
        ylim = c(0,50)
)
# Adding legend
legend(
          x = 0.2, y = 46,
          c("HP","NP", "OP"),
          fill = c("#cc1489", "#0f993d", "#1262b3")
)

Next, a specific place for each tick is drawn on the main y-axis to represent the Centigrade scale. The at argument specifies the placements of the tick marks for the axis, and the value 2 indicates that the axis should be drawn in margin 2 (the left margin).

# Creating barplot
barplot(formula = yield ~ priming, 
        data = df2, 
        main = "Barplot showing yield for each level of priming", 
        xlab = "Seed priming", 
        ylab = "Yield", 
        col = c("#cc1489", "#0f993d", "#1262b3"), 
        # Customizing Y axis limits
        ylim = c(0,50)
)
# Adding legend
legend(
          x = 0.2, y = 46,
          c("HP","NP", "OP"),
          fill = c("#cc1489", "#0f993d", "#1262b3")
)

# Specifying tick marks for the Y axis
axis(2, at = seq(0, 50, 5))

Adding box aroung the plot

The function box can be used to draw a box around the current plot and we can specify color and linetype for the box using col and lty arguments.

# Creating barplot
barplot(formula = yield ~ priming, 
        data = df2, 
        main = "Barplot showing yield for each level of priming", 
        xlab = "Seed priming", 
        ylab = "Yield", 
        col = c("#cc1489", "#0f993d", "#1262b3"), 
        # Customizing Y axis limits
        ylim = c(0,50)
)
# Adding legend
legend(
          x = 0.2, y = 46,
          c("HP","NP", "OP"),
          fill = c("#cc1489", "#0f993d", "#1262b3")
)

# Specifying tick marks for the Y axis
axis(2, at = seq(0, 50, 5))
# Adding box
box(col="gray", lty = "solid")

Adding horizontal line

Finally, we can add horizontal lines to indicate the mean value of the response variable as y-value cut-off.

# Creating barplot
barplot(formula = yield ~ priming, 
        data = df2, 
        main = "Barplot showing yield for each level of priming", 
        xlab = "Seed priming", 
        ylab = "Yield", 
        col = c("#cc1489", "#0f993d", "#1262b3"), 
        # Customizing Y axis limits
        ylim = c(0,50)
)
# Adding legend
legend(
          x = 0.2, y = 46,
          c("HP","NP", "OP"),
          fill = c("#cc1489", "#0f993d", "#1262b3")
)

# Specifying tick marks for the Y axis
axis(2, at = seq(0, 50, 5))
# Adding box
box(col="gray", lty = "solid")

abline (h = mean(df$yield), col="red", lty = "dashed")

Showing SE on each bar

We shall first compute the mean and standard error for the response variable (yield) grouped by priming using pipe operator from dplyr package. The results are stored in an object called df3.

df3 <- df %>% 
          group_by(priming) %>% 
          summarise(yield2 = mean(yield),
                    se = sd(yield)/sqrt(10)) 
df3
# # A tibble: 3 x 3
#   priming yield2    se
#   <chr>    <dbl> <dbl>
# 1 HP        13.2 0.754
# 2 NP        24.6 2.46 
# 3 OP        37.7 3.68

Using df3 we can produce the same barplot and then SE can be shown by using arrows function.

# Creating barplot
plot <- barplot(formula = yield2 ~ priming, 
        data = df3, 
        main = "Barplot showing yield for each level of priming", 
        xlab = "Seed priming", 
        ylab = "Yield", 
        col = c("#cc1489", "#0f993d", "#1262b3"), 
        # Customizing Y axis limits
        ylim = c(0,50)
)
# Adding legend
legend(
          x = 0.2, y = 46,
          c("HP","NP", "OP"),
          fill = c("#cc1489", "#0f993d", "#1262b3")
)

# Specifying tick marks for the Y axis
axis(2, at = seq(0, 50, 5))
# Adding box
box(col="gray", lty = "solid")

abline (h = mean(df$yield), col="red", lty = "dashed")
# Adding SE bars
arrows(x0 = plot,
       y0 = df3$yield2 + df3$se,
       y1 = df3$yield2 - df3$se,
       angle = 90,
       code = 3,
       length = 0.1)

If you have any questions feel free to ask. You can send your queries as an email at agron.infotech@gmail.com


Download R program — Click here

Download R studio — Click here