AGRON INFO TECH

A simple way to create a custom function in R

custom function

In R, a function is a block of code that performs a specific task or operation. It is defined by a name, a set of input parameters, and a set of instructions that are executed when the function is called. Functions can be built-in to R or created by users to perform specific tasks. In this blog post we shall learn how to create a custom function in R programming.

Functions in R are called using their name followed by parentheses. The input parameters are passed into the function within the parentheses, separated by commas. The output of the function is returned using the return() statement.

Here is an example of a custom function that computes summary statistics of the variables in the dataset. In this post, you shall learn how to create a custom function in R by following this simple method.

Loading iris dataset

The iris dataset is a famous dataset in R that contains information about the characteristics of three species of iris flowers. It is often used as a sample dataset in data analysis and machine learning. To access the iris dataset in R, you can use the data() function. First six rows of the dataset were printed using the head() function.

# Creating data
data("iris")
head(iris)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

The columns in the dataset are Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species. The first four columns represent the measurements of the length and width of the sepals and petals of the flowers, and the last column indicates the species of the flower (setosa, versicolor, or virginica). We shall use this dataset to get the summary statistics for the variables using our own function.

Layout of the function

To create a function in R, you can use the function() keyword, followed by the name of the function and a set of parentheses containing the input parameters. Below is the default layout when you only type fun and press enter. We shall modify it to get the required summary statistics.

name <- function(variables) {
          
}

Creating the custom function

The summary statistics will include the mean, median and standard deviation. We shall write our code for the custom function within curly brackets.

Calculations will be carried out using for loop function. A for loop is a common type of loop used in R for iterating over a sequence of values or indices. The basic syntax of a for loop in R is as follows:

for (variable in vector) {
          
}

The sequence can be a vector or a range of values specified using the : operator. The variable is a loop variable that takes on the values in the sequence one by one in each iteration of the loop. Within the loop, you can write any code that you want to execute repeatedly, in this case calculations for mean, median and standard deviation.

Within curly brackets the first line of code specify the names of the variables. We shall use it later to assign names to each variable after computing mean, median and standard deviation.

In our case we used a for loop to iterate over the elements of the iris data using the length() function to determine the length of the variables. Within the loop, we printed the value of each element of the data using the loop variable i to index the variables.

The list() function is used to create a new list object, and you can add items to the list using the [[]] or $ operator. We created list of mean, median and standard deviation, then we unlist it and stored as data frame. The three data frames average, median and std were then combined into a single data frame (out) using cbind() function.

summary <- function(data) {
          names <- names(data)
          # Getting mean values for each variable
          average <- list()
          for (i in 1:length(data)) {
                              average[[i]] <- mean(data[[i]]) 
                              names(average[[i]]) <- names[[i]]
          }
          # Getting median values for each variable
          median <- list()
          for (i in 1:length(data)) {
                              median[[i]] <- median(data[[i]]) 
                              names(median[[i]]) <- names[[i]]
          }
          # Getting SD values for each variable
          std <- list()
          for (i in 1:length(data)) {
                              std[[i]] <- sd(data[[i]]) 
                              names(std[[i]]) <- names[[i]]
          }
          
          average <- as.data.frame(unlist(average))
          colnames(average) <- "Mean"
          
          median <- as.data.frame(unlist(median))
          colnames(median) <- "Median"

          std <- as.data.frame(unlist(std))
          colnames(std) <- "Standard deviation"
          
          out <- t(cbind(average, median, std))

          return(out)

}

Getting output from the customized function

This function takes only one input parameter data. This will return mean, median and standard deviation for each numeric variable in the dataset. The instructions within the function calculate the mean, median and standard deviation of data and return the result as data frame using return() function. To use this function, we simply call it by giving the name of the object containing data.

summary(data = iris[-5])
#                    Sepal.Length Sepal.Width Petal.Length Petal.Width
# Mean                  5.8433333   3.0573333     3.758000   1.1993333
# Median                5.8000000   3.0000000     4.350000   1.3000000
# Standard deviation    0.8280661   0.4358663     1.765298   0.7622377

Download R program — Click_here

Download R studio — Click_here