Subroutines


Writing code that does many things can be a bit of a nightmare. If you start to edit one part of the code, you can mess up another part of the code because you now have your data in a different form, variable names are different or you clobbered (overwrote) a variable with the same name somewhere else in the code.

You might find yourself wishing you could take a good clean part of your code and set it aside in its own little space where it can't be messed up by anything else.

You might also want to perform the same task multiple times in your script and a loop just doesn't work for the task. Wouldn't it be great if you didn't have to cut and paste that code?

And if you are cutting and pasting code, wouldn't it be great if you only had to change it in one place, instead of six?

All of these desires and more can be satisfied using subroutines!

What is a subroutine?


A subroutine is a named block of code that has a formalized method for passing information to it and getting information out of it.

You can roughly think about all the functions we have used as pre-defined subroutines. Subroutines, by this analogy, allow you to write your own functions.

Basic syntax


To declare and define a named subroutine the basic syntax is:

sub NAME BLOCK
 
or
 
sub my_sub_routine {
    # code to be executed when this is called
}

This code doesn't actually run when its in a Perl script. You have to call it, just like you have to call a function. The syntax is:

my_sub_routine()
 
or
 
my_sub_routine(LIST_OF_ARGUMENTS)

Let's look at an example.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# declare and define subroutine
sub print_hello {
    print "hello, i'm a subroutine!\n";
}
 
__END__

Here, all we've done is declare and define a subroutine called "print_hello". When it is called it will print the statement "hello, i'm a subroutine!", just as if that print statement had been written in the main body of the Perl script. No call has been made to it, so nothing happens.

Let's see how to call the subroutine.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# call subroutine
print_hello();
 
# declare and define subroutine
sub print_hello {
    print "hello, i'm a subroutine!\n";
}
 
__END__

Calling the subroutine causes the print statement to be executed. Notice that even though this subroutine does not take any arguments, you still need to use the empty parentheses. Also notice that the subroutine comes after the call to it in the Perl script. You can put subroutines anywhere you want. Perl finds all of them and understands how to use them regardless of where they are in the script. The convention, however, is to put subroutines at the bottom of the script and to have the main body of the Perl script, where calls to the subroutines occur, at the top of the script.

Subroutines can be called as many times as you like, and thus are completely reusable code that never needs to be retyped.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# standard for loop
for (my $i = 0; $i < 5; $i++) {
    # call subroutine
    print_hello();
    # make a print statement outside of subroutine
    print "and i'm not a subroutine!\n";
}
 
# declare and define subroutine
sub print_hello {
    print "hello, i'm a subroutine!\n";
}
 
__END__

Here the "print_hello" subroutine is being called from within a "for" loop, which also has a print statement. Again, the print statements from within the subroutine are exactly the same as those outside of it.

Input & Output


This example subroutine is not the most exciting piece of code. Functions can act on input arguments to do something dynamic and useful. So can subroutines.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# standard for loop
for (my $i = 0; $i < 5; $i++) {
    # print value of i
    print "$i squared is ";
 
    # call subroutine
    print_square($i);
}
 
# declare and define subroutine
sub print_square {
    # copy subroutine arguments
    # from @_ to easy to use variables
    my ($input_number) = @_;
 
    # square input number
    my $square = $input_number ** 2;
 
    # print square
    print "$square\n";
}
 
__END__

Let's first look at the subroutine here, called "print_square". This subroutine takes a number and prints the square of the number.

How does the subroutine get the input argument?

When the subroutine gets called, an array called "@_" is created with the arguments passed to the subroutine as the elements in the array. In order to use these arguments, they need to copied out of "@_".

How do you copy values out of an array to individual variables? One way is:

my $arg1 = $_[0];

This is fine but as soon as you have multiple arguments to copy, this gets tedious. There is a most elegant and brief way.

my ($arg1, $arg2, $arg3) = @_;

This may look strange at first but if I remind you that the way you declare an array with values looks very similar, perhaps it doesn't seem so strange. For example:

my @array = (4, "go team", $counter);

On the right is a comma separated list. On the left is an array. You've seen copying arrays using:

my @copy = @array;

So then it makes sense that you can copy an array into a list.

Now, back to the subroutine. We only have one argument. You can still copy the array in list context with one argument:

my ($input_number) = @_;

Now that you have the value of the argument in a variable in the subroutine code block, you can act on it. Here the code takes the number to the power of 2 and stores that in another variable which then gets printed.

In the main body of the script, "print_square" gets called in a "for" loop.

Its important to note that all the variables created in the code block in the subroutine are locally scoped. All the normal scoping rules apply here.

A very straight-forward extension of the above subroutine, is to have two arguments where one is the number being operated on as before and the other is the power to which the first number gets taken.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# call subroutine
print_i_to_the_j_power(2, 3); # should be 8
print_i_to_the_j_power(3, 2); # should be 9
 
# declare and define subroutine
sub print_i_to_the_j_power {
    # copy subroutine arguments
    # from @_ to easy to use variables
    my ($i, $j) = @_;
 
    # make calculation
    my $result = $i ** $j;
 
    # print result
    print "$result\n";
}
 
__END__

Just like the one argument example, we copy the "@_" array that holds the argument values to new variables, "$i" and "$j". These variable names are not at all descriptive but the name of the subroutine makes clear what they are.

We've described how arguments can be passed to subroutines that run some code that prints a result. Many of the functions we are used to using, however, return a result. Like "length()" returns the length of a string. The way you get a subroutine to do the same thing is to use the return function.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# call subroutine
my $two_to_the_3rd_power = i_to_the_j_power(2, 3); # should be 8
my $three_to_the_2nd_power = i_to_the_j_power(3, 2); # should be 9
 
# print results
print "two_to_the_3rd_power = $two_to_the_3rd_power\n",
    "three_to_the_2nd_power = $three_to_the_2nd_power\n";
 
# declare and define subroutine
sub i_to_the_j_power {
    # copy subroutine arguments
    # from @_ to easy to use variables
    my ($i, $j) = @_;
 
    # make calculation
    my $result = $i ** $j;
 
    # return result
    return($result);
}
 
__END__

Calling "return()" on "$result" causes the subroutine to return the value of "$result", just like other functions, such that it can be captured in a variable when the subroutine is called in the main part of the script.

"return()" can take as many arguments as you want to give it. If just one argument it given, then the output of the subroutine can be captured in a scalar variable. If more than one argument is given to return, then the output of the subroutine can be captured in an array.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# call subroutine
my @expanded_range = expand_range(200, 209);
 
# print expanded range
print join("...", @expanded_range), "\n";
 
# declare and define subroutine
sub expand_range {
    # copy subroutine arguments
    # from @_ to easy to use variables
    my ($min, $max) = @_;
 
    # declare array
    my @series;
 
    # loop from min to max
    for (my $i = $min; $i <= $max; $i++) {
        push(@series, $i);
    }
 
    # return result
    return(@series);
}
 
__END__

Here a subroutine called "expand_range" takes two arguments, a "$min" and "$max". Within the subroutine, an array is created, populated with the integers between the "$min" and "$max". This array then gets returned. In the main body of the script, when "expand_range" is called, it returns a list which gets captured in an array called "@expanded_range".

One odd thing about subroutines is that if you have errors in them and you are trying to print to standard error, you have both the place in the subroutine that was causing a problem as well as the place in the main body of the script where it was being called. Additionally, you may want to break out of subroutine because of problematic arguments having been passed, but you might not want to exit from the whole Perl script. To deal with both of these issues, there are two special functions for subroutines: carp and croak.

#!/usr/bin/perl
 
use warnings;
use strict;
use Carp;
 
# check argv and return usage if not two args
unless (scalar(@ARGV) == 2) {
    die "usage: min max\n";
}
 
# get min and max from argv
my ($min, $max) = @ARGV;
 
# call subroutine
my @expanded_range = expand_range($min, $max);
 
# print expanded range
print join("...", @expanded_range), "\n";
 
# declare and define subroutine
sub expand_range {
    # copy subroutine arguments
    # from @_ to easy to use variables
    my ($min, $max) = @_;
 
    # if min equals max send a warning to standard error
    if ($min == $max) {
        # carp is warn for subroutines
        carp "silly use of expand_range for min $min = max $max";
    }
    # if min is greater than max, exit subroutine with
    # warning to standard error
    elsif ($min > $max) {
        # croak is die for subroutines
        croak "expand_range failed because min $min > max $max";
    }
 
    # declare array
    my @series;
 
    # loop from min to max
    for (my $i = $min; $i <= $max; $i++) {
        push(@series, $i);
    }
 
    # return result
    return(@series);
}
 
__END__

This is the same script as the previous example except it now checks to see if the arguments make sense. If "$min == $max" then you can't really make a very interesting series so a warning is printed using the "carp" function. "carp" is the subroutine version of "warn". Further, if "$min > $max" then you can't have a series at all, so a warning is printed and the subroutine is exited using the "croak" function. "croak" is "die" for subroutines. Its important to be clear that "croak" does not exit the Perl script. Just the subroutine. Neither of these functions can be used unless you have "use Carp;" at the top of your program.

Truth


We have figured out how to pass arguments to a subroutine and how to return a single value or list of values from a subroutine. One common usage for subroutines is as a truth test.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# check argv and return usage if not two args
unless (scalar(@ARGV) == 2) {
    die "this script will test if two numbers are equal to each other\n",
        "usage: num1 num2\n";
}
 
# get args from argv
my ($num1, $num2) = @ARGV;
 
# call subroutine in an if statement
if (is_equal($num1, $num2)) {
    print "you bet $num1 = $num2\n";
}
else {
    print "maybe $num1 = $num2 on your planet\n";
}
 
# declare and define subroutine
sub is_equal {
    # copy subroutine arguments
    # from @_ to easy to use variables
    my ($n1, $n2) = @_;
 
    # check if n1 is equals to n2 return truth
    if ($n1 == $n2) {
        return(1);
    }
    # if n1 is not equal to n2 return false
    else {
        return(0);
    }
}
 
__END__

Here the subroutine "is_equal" takes two numbers and these numbers are numerically tested for equality. If they are equal the subroutine returns "1" and if not it returns "0". When these values are evaluated in an "if" statement, they will either be "true" or "false".

If you have a large program and you are doing the same truth test over and over again, its rather reassuring to write it out just once and keep it in a subroutine like this rather than relying on it being typed correctly through out your program.

Nesting


This truth test leads nicely to the next point, which is that subroutines can be called within subroutines. Sometimes you may even want to recursively call a subroutine within itself! For a simple example of calling one subroutine in another, here's the above truth test in the earlier example of the range expander:

#!/usr/bin/perl
 
use warnings;
use strict;
use Carp;
 
# check argv and return usage if not two args
unless (scalar(@ARGV) == 2) {
    die "usage: min max\n";
}
 
# get min and max from argv
my ($min, $max) = @ARGV;
 
# call subroutine
my @expanded_range = expand_range($min, $max);
 
# print expanded range
print join("...", @expanded_range), "\n";
 
# declare and define subroutine
sub expand_range {
    # copy subroutine arguments
    # from @_ to easy to use variables
    my ($min, $max) = @_;
 
    # if min equals max send a warning to standard error
    if (is_equal($min, $max)) {
        # carp is warn for subroutines
        carp "silly use of expand_range for min $min = max $max";
    }
    # if min is greater than max, exit subroutine with
    # warning to standard error
    elsif ($min > $max) {
        # croak is die for subroutines
        croak "expand_range failed because min $min > max $max";
    }
 
    # declare array
    my @series;
 
    # loop from min to max
    for (my $i = $min; $i <= $max; $i++) {
        push(@series, $i);
    }
 
    # return result
    return(@series);
}
 
sub is_equal {
    # copy subroutine arguments
    # from @_ to easy to use variables
    my ($n1, $n2) = @_;
 
    # check if n1 is equals to n2 return truth
    if ($n1 == $n2) {
        return(1);
    }
    # if n1 is not equal to n2 return false
    else {
        return(0);
    }
}
 
__END__

I just cut and pasted the "is_equal" subroutine into this script and it was ready to go. This is a great example of how code can very easily be reused if it is in a subroutine.

Two examples of subroutines: transliteration & sorting


Transliteration


The following script has a subroutine that uses the transliteration operator, "tr/". While this operator may look like it would use regular expressions, it does not. Similar to "m" and "s///", it has the syntax:

$string =~ tr/SEARCHLIST/REPLACEMENTLIST/;

SEARCHLIST and REPLACEMENTLIST are lists of characters without separators, like character classes, that are exactly the same length. What the operator does is it globally replaces every instance of the first character in the SEARCHLIST with the first character in the REPLACEMENTLIST, the second character in the SEARCHLIST with the second character in the REPLACEMENTLIST etc.

Here's an example using the transliteration operator in a subroutine.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# declare and print message variable
my $message = 'The password to my email account is IMABADPSSWD.';
print "message: $message\n";
 
# encrypt message and print
my $encrypted = rot13_encrypt($message);
print "encrypted: $encrypted\n";
 
# unencrypt message and print
my $unencrypted = rot13_encrypt($encrypted);
print "unencrypted: $unencrypted\n";
 
# declare and define subroutine
sub rot13_encrypt {
    # copy subroutine arguments
    # from @_ to easy to use variables
    my ($text) = @_;
 
    # transliterate text using rot13 encryption
    $text =~ tr/A-Za-z/N-ZA-Mn-za-m/;
 
    # return encrypted text
    return($text);
}
 
__END__

The "rot13_encrypt" subroutine takes a string as an argument, performs the rot13 transliteration, where A-M gets turned into N-Z and visa versa. The subroutine then returns the encrypted text. Calling the subroutine once on a string and then again on the encrypted string results in the original string. This isn't a very good form of encryption but it certainly can make reading something encrypted in this was difficult without this subroutine handy.

Sorting


In Thursday's class and exercises we learned about the "sort" function that can be called on an array to put it in order. It did a great job of sorting strings in alphabetical order but "sort" didn't sort numbers the way you want it to:

#!/usr/bin/perl
 
use warnings;
use strict;
 
# declare arrays
my @genome_sizes = (4.6, 12, 43, 125, 97, 180, 2600);
 
# sort genome sizes
my @sorted = sort @genome_sizes;
 
# print
print 'prior to sorting: ', join(", ", @genome_sizes), "\n",
    'after sorting: ', join(", ", @sorted), "\n";
 
__END__

The output of this script was "after regular sorting: 12, 125, 180, 2600, 4.6, 43, 97". "2600" is not less than "4.6". How can we make "sort" do a numerical sort?

The answer is a very unusual subroutine that modifies the sort function. This subroutine uses unconventional syntax.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# declare arrays
my @genome_sizes = (4.6, 12, 43, 125, 97, 180, 2600);
 
# sort genome sizes
my @sorted = sort @genome_sizes;
my @numerically_sorted = sort numerically @genome_sizes;
 
# print
print 'prior to sorting: ', join(", ", @genome_sizes), "\n",
    'after regular sorting: ', join(", ", @sorted), "\n",
    'after numerical sorting: ', join(", ", @numerically_sorted), "\n";
 
# declare and define subroutines
sub numerically {
    # this subroutine modifies the sort function
    # and therefor has very different syntax than
    # normal subroutines.
    # there is no need to copy values in @_
    # pairs of elements from the array being sorted
    # are passed to this subroutine as the variables
    # $a and $b
    # do not change the values of these variables!
 
    # return 1 if $a is greater than $b
    if ($a > $b) {
        return(1);
    }
    # return 0 if $a is equal to $b
    elsif ($a == $b) {
        return(0);
    }
    # return -1 if $a is less than $b
    else {
        return(-1);
    }
}
 
__END__

So there is a subroutine called "numerically" which gets called by placing it between the "sort" function and the array it is sorting:

sort numerically @genome_sizes;

Also, "numerically" isn't called using parentheses. If you look at the code block for "numerically", it doesn't use the "@_" array. In this very unique usage for a subroutine, the subroutine gets passed pairs of values from the array to evaluate, until the array is sorted completely. Each pair of values is given to the subroutine automatically without having to copy them and they are stored in the variables "$a" and "$b". Do not try to alter these variables. They are special and need to be left alone. This subroutine can return one of three values to indicate how to treat "$a" and "$b" in the sort. If sort is to put "$a" after "$b" then the subroutine should return "1". If they are not to be sorted then return "0". And if "$b" should be after "$a" then return "-1".

Yes, this is bizarre. I don't actually know why it is done this way. Fortunately, you don't actually need to know any of these special sort return values because there is an operator that will take care of it for you. Let me introduce you to the "spaceship" operator "<=>".

#!/usr/bin/perl
 
use warnings;
use strict;
 
# declare arrays
my @genome_sizes = (4.6, 12, 43, 125, 97, 180, 2600);
 
# sort genome sizes
my @sorted = sort @genome_sizes;
my @numerically_sorted = sort numerically @genome_sizes;
 
# print
print 'prior to sorting: ', join(", ", @genome_sizes), "\n",
    'after regular sorting: ', join(", ", @sorted), "\n",
    'after numerical sorting: ', join(", ", @numerically_sorted), "\n";
 
# declare and define subroutines
sub numerically {
    # use spaceship operator to do the comparison and set the value to return
    return($a <=> $b);
}
 
__END__

The "spaceship" operator does exactly the same thing as the code in the subroutine in the previous example. The only thing you need to concern yourself with is that "$a <=> $b" sorts a list in ascending order and "$b <=> $a" sorts a list in descending order.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# declare arrays
my @genome_sizes = (4.6, 12, 43, 125, 97, 180, 2600);
 
# sort genome sizes
my @sorted = sort @genome_sizes;
my @ascending_numerically_sorted = sort ascending_numerically @genome_sizes;
my @descending_numerically_sorted = sort descending_numerically @genome_sizes;
 
# print
print 'prior to sorting: ', join(", ", @genome_sizes), "\n",
    'after regular sorting: ', join(", ", @sorted), "\n",
    'after ascending numerical sorting: ', join(", ", @ascending_numerically_sorted), "\n",
    'after descending numerical sorting: ', join(", ", @descending_numerically_sorted), "\n";
 
# declare and define subroutines
sub ascending_numerically {
    # use spaceship operator to do the comparison and set the value to return
    return($a <=> $b);
}
 
sub descending_numerically {
    # use spaceship operator to do the comparison and set the value to return
    return($b <=> $a);
}
 
__END__

One last point with sorting is that the subroutine determines the order of the sort but the "spaceship" operator can act on any numerical value. It doesn't have to be the actual values of "$a" and "$b".

#!/usr/bin/perl
 
use warnings;
use strict;
 
# declare arrays
my @genome_sizes = (4.6, 12, 43, 125, 97, 180, 2600);
 
# sort genome sizes
my @sorted = sort @genome_sizes;
my @ascending_numerically_sorted = sort ascending_numerically @genome_sizes;
my @descending_numerically_sorted = sort descending_numerically @genome_sizes;
my @by_length_sorted = sort by_length @genome_sizes;
 
# print
print 'prior to sorting: ', join(", ", @genome_sizes), "\n",
    'after regular sorting: ', join(", ", @sorted), "\n",
    'after ascending numerical sorting: ', join(", ", @ascending_numerically_sorted), "\n",
    'after descending numerical sorting: ', join(", ", @descending_numerically_sorted), "\n",
    'after by length sorting: ', join(", ", @by_length_sorted), "\n";
 
# declare and define subroutines
sub ascending_numerically {
    # use spaceship operator to do the comparison and set the value to return
    return($a <=> $b);
}
 
sub descending_numerically {
    # use spaceship operator to do the comparison and set the value to return
    return($b <=> $a);
}
 
sub by_length {
    # use spaceship operator to do the comparison and set the value to return
    return(length($a) <=> length($b));
}
 
__END__

Here the "by_length" subroutine uses the "spaceship" operator to sort the array on the length of the elements, not the values. So the output is: 12, 43, 97, 4.6, 125, 180, 2600.

Exercises


Problem1: The Divider


  • Write a script with a subroutine that takes two arguments and divides the first argument by the second argument and then prints the result. Make the script such that the user can specify the two arguments at run-time using @ARGV.
  • Are there values for the arguments that will break your script?
  • Write a second version of your script that sends a warning to standard error if there are improper arguments and then exists the subroutine.
  • Write a third version of your script that does not print the result of the division from within the subroutine but rather has the subroutine return the value which then gets printed in the main body of the script.

Problem2: Sum & Mean


  • Write a subroutine that takes an array of numbers of arbitrary length and calculates the sum of the numbers, returning the sum.
  • Write another subroutine that takes an array of number of arbitrary length and calculates the mean (hint: you already have code that calculates a sum and code that does division so this should be easy).

Problem3: The Final Reverse Complement


  • Write a subroutine that reverse complements a nucleotide sequence passed as the only argument to the subroutine. Put this subroutine in a script that tests its function (hint: this ought to be similar to the encryption example).

Problem4: AT:GC Basecomp


  • Write a script that has a subroutine that takes a fasta file as its only argument. This subroutine should figure out how many A or T basepairs and how many G or C basepairs there are in the sequence in the fasta file. The subroutine should return these two numbers. The script should print the two numbers.
  • Make a fasta file with known basecomp to help you test your script.
  • Write a second verison of the script that uses the division subroutine from Problem1 to calculate the ratio of AT basepairs to GC basepairs.

Problem5: Gene Length Sorted Properly and Mean


  • Session 4.2 Problem 6 asked you to download the cerevisiae genome annotation file from SGD and to then parse out the gene length and sort them. You couldn't do a proper sort until today. Do this properly now.
  • Use the mean subroutine from Problem2 to get the mean gene length in cerevisiae.

Problem6: Get Subsequence


  • Write a subroutine that takes a sequence string as well as the start, end and strand coordinates for a subsequence within the sequence string and returns the subsequence.
  • Use this subroutine in a script with an example sequence to demonstrate it works.