AGRON INFO TECH

Creating rapid summary table showing mean and standard error using R program

Introduction

In this blog post, we’ll show you how to quickly create a summary table that can unlock valuable insights from your data. Not only will we walk you through the code step by step, but we’ll also highlight the significance of this table.

Package Loading

Let’s load the libraries needed for our analysis. These packages will enable us to create our summary table in publication ready style.

# Loading packages
library(dplyr)
library(purrr)
library(flextable)

Importing dataset

Let’s introduce our data first. We have a dataset called “df” that includes different study-related factors. Among them are “rep,” “priming,” “aba,” “apx,” “pod,” and “ph.”

We load this data into R, a powerful data analysis tool. The “df” dataset contains multiple observations, each with a “priming” category. Our goal is to create a summary table that showcases the mean and standard error (SE) for specific variables within each “priming” category.

# Creating data
df <- data.frame(
  rep = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
  priming = c("NP", "NP", "NP", "HP", "HP", "HP", "OP", "OP", "OP"),
  aba = c(4.60, 5.70, 4.10, 7.10, 5.60, 6.30, 8.00, 7.40, 7.80),
  apx = c(0.90, 0.77, 0.81, 0.97, 0.91, 0.87, 1.73, 1.63, 1.45),
  pod = c(3.78, 4.40, 5.33, 6.73, 6.31, 5.40, 8.38, 7.40, 7.63),
  ph = c(91, 82, 87, 94, 87, 82, 110, 96, 92)
)

print(df)
#   rep priming aba  apx  pod  ph
# 1   1      NP 4.6 0.90 3.78  91
# 2   2      NP 5.7 0.77 4.40  82
# 3   3      NP 4.1 0.81 5.33  87
# 4   1      HP 7.1 0.97 6.73  94
# 5   2      HP 5.6 0.91 6.31  87
# 6   3      HP 6.3 0.87 5.40  82
# 7   1      OP 8.0 1.73 8.38 110
# 8   2      OP 7.4 1.63 7.40  96
# 9   3      OP 7.8 1.45 7.63  92

To get started, we need to choose the variables we want to summarize. In this case, we’ve selected “aba,” “apx,” “pod,” and “ph.”

# Create a vector of variable names to summarize 
variables_to_summarize <- names(df[,-c(1,2)])
variables_to_summarize
# [1] "aba" "apx" "pod" "ph"

Initializing summary table

Now, let’s initialize an empty list called “summary_list” to store our summary results. This list will be crucial as we progress through the analysis.

summary_list <- list()

Here comes the heart of our analysis. We’re going to calculate the mean and SE for each selected variable, grouped by the “priming” category. This is where the magic happens!

We loop through each selected variable and create a summary table for it. This summary table contains the mean value and the SE, both calculated within their respective “priming” groups. To ensure clarity, we rename the columns to match the variable name. This will make it easier to identify which variable each table represents.After creating each summary table, we add it to our “summary_list.” This list will ultimately contain all our summary results.

# Loop through each variable and summarize by priming group
for (variable in variables_to_summarize) {
  summary_result <- df %>%
    group_by(priming) %>%
    summarize(
      mean_val = mean(.data[[variable]], na.rm = TRUE),
      SE = sd(.data[[variable]], na.rm = TRUE) / sqrt(n())
    )
  summary_result <- rename(summary_result, !!variable := mean_val)
  summary_list[[variable]] <- summary_result
}


# Print the summary data frame
print(summary_list)
# $aba
# # A tibble: 3 × 3
#   priming   aba    SE
#   <chr>   <dbl> <dbl>
# 1 HP       6.33 0.433
# 2 NP       4.8  0.473
# 3 OP       7.73 0.176
# 
# $apx
# # A tibble: 3 × 3
#   priming   apx     SE
#   <chr>   <dbl>  <dbl>
# 1 HP      0.917 0.0291
# 2 NP      0.827 0.0384
# 3 OP      1.60  0.0819
# 
# $pod
# # A tibble: 3 × 3
#   priming   pod    SE
#   <chr>   <dbl> <dbl>
# 1 HP       6.15 0.393
# 2 NP       4.50 0.450
# 3 OP       7.80 0.296
# 
# $ph
# # A tibble: 3 × 3
#   priming    ph    SE
#   <chr>   <dbl> <dbl>
# 1 HP       87.7  3.48
# 2 NP       86.7  2.60
# 3 OP       99.3  5.46

Formatting summary table

Now that we have our summary results, it’s time to make them visually appealing and easy to interpret. This step involves merging the mean and SE values into a single, informative column. We create an empty list called “formatted_summary” to store our formatted results. This is where we’ll enhance the clarity of our tables.

# Create an empty list to store the formatted tibbles
formatted_summary <- list()

We loop through each element in the “summary_list,” and for each table, we merge the mean and SE values into a single column, separated by a plus-minus sign. To keep our tables clean, we remove the original variables containing mean, SE, SE merged columns, leaving us with the essential information. Finally, we ensure that our variable names are free from any “_merged_column” suffix.

# Loop through each element in the summary_list
for (var_name in names(summary_list)) {
  # Merge the aba and SE columns with a plus-minus sign and store in a new column
  formatted_tibble <- summary_list[[var_name]] %>%
    mutate_at(vars(-priming), 
              list(merged_column = ~paste0(round(.x, 2), 
                                           " ± ", 
                                           round(SE, 2))))
  
  # Remove the original SE columns
  formatted_tibble <- formatted_tibble %>% 
    select(-SE, -var_name, -SE_merged_column)
  
  formatted_tibble <- formatted_tibble %>%
    rename_all(~gsub("_merged_column", "", .))
  
  # Add the formatted tibble to the formatted_summary list
  formatted_summary[[var_name]] <- formatted_tibble
}

print(formatted_summary)
# $aba
# # A tibble: 3 × 2
#   priming aba        
#   <chr>   <chr>      
# 1 HP      6.33 ± 0.43
# 2 NP      4.8 ± 0.47 
# 3 OP      7.73 ± 0.18
# 
# $apx
# # A tibble: 3 × 2
#   priming apx        
#   <chr>   <chr>      
# 1 HP      0.92 ± 0.03
# 2 NP      0.83 ± 0.04
# 3 OP      1.6 ± 0.08 
# 
# $pod
# # A tibble: 3 × 2
#   priming pod        
#   <chr>   <chr>      
# 1 HP      6.15 ± 0.39
# 2 NP      4.5 ± 0.45 
# 3 OP      7.8 ± 0.3  
# 
# $ph
# # A tibble: 3 × 2
#   priming ph          
#   <chr>   <chr>       
# 1 HP      87.67 ± 3.48
# 2 NP      86.67 ± 2.6 
# 3 OP      99.33 ± 5.46

Combining summary tables

With our individual tables looking sleek and informative, it’s time to combine them into a single, comprehensive summary table. We used imap function to iterate through the elements of formatted summary list to exclude the first column. Then we use the “bind_cols” function to merge the tables from combined summary list. This ensures that our final table remains organized and easy to read.

combined_summary <- formatted_summary %>%
  imap(~ {
    if (.y == 1) {
      .x  
    } else {
      select(.x, -1)  # Exclude the first column for the rest of the lists
    }
  }) %>%
  reduce(bind_cols)

combined_summary
# # A tibble: 3 × 4
#   aba         apx         pod         ph          
#   <chr>       <chr>       <chr>       <chr>       
# 1 6.33 ± 0.43 0.92 ± 0.03 6.15 ± 0.39 87.67 ± 3.48
# 2 4.8 ± 0.47  0.83 ± 0.04 4.5 ± 0.45  86.67 ± 2.6 
# 3 7.73 ± 0.18 1.6 ± 0.08  7.8 ± 0.3   99.33 ± 5.46

We add the row names to our final table, making it more informative.

# Adding the row names
combined_summary <- bind_cols(summary_list[[1]][1], 
                              combined_summary)

Creating summary table using flextable package

We’re going to use the powerful “flextable” package to create the final table that can be compiled to MS word file. We’re adjusting column widths, highlighting the header, and even adding a caption to make it clear and visually appealing. This table provides a comprehensive overview of the summary of the data.

# Creating the summary table using flextable
table <- flextable(data = combined_summary) %>%
  # Increasing column width
  width(width = 1) %>%
  # Formatting the table header
  bold(bold = TRUE, part = "header") %>%
  # Adding table caption
  set_caption(caption = "Table 1: Effect of seed priming on enzyme activity")

table
primingabaapxpodph
HP6.33 ± 0.430.92 ± 0.036.15 ± 0.3987.67 ± 3.48
NP4.8 ± 0.470.83 ± 0.044.5 ± 0.4586.67 ± 2.6
OP7.73 ± 0.181.6 ± 0.087.8 ± 0.399.33 ± 5.46
Table 1: Effect of seed priming on enzyme activity

Next, navigate to the “Compile Report” icon and select Microsoft Word from the dropdown menu. Simply click the “Compile” button, and it will generate the table within your MS Word document.

knitr::include_graphics("Images/Screenshot_4.png")

And there you have it, our beautifully crafted summary table!

knitr::include_graphics("Images/Screenshot_5.png")

But why is it so important, you might ask?

In many research publications, it’s common to present mean and standard error values in tabular format. To simplify this process within R, we’ve developed an effortless and efficient method, and it’s not limited to just this dataset. You can use the same code structure for other datasets by simply changing the variable name you want to summarize.

Conclusion

We started by introducing our dataset and selecting the variables we wanted to analyze. We then demonstrated how to calculate the mean and standard error (SE) for each variable, grouped by a specific category. This step is crucial for understanding the data and extracting meaningful insights. We walked you through the entire process, from formatting the tables to merging values and combining multiple tables into a single, comprehensive summary.

Stay tuned for more data-driven content and happy analyzing! 🤓


Download R program — Click_here

Download R studio — Click_here