References


Merriam-Webster:

ref·er·ence
Pronunciation: 're-f&rn(t)s, 're-f(&-)r&n(t)s
Function: noun
1 : the act of referring or consulting
2 : a bearing on a matter : RELATION <in reference to your recent letter>
3 : something that refers : as a : ALLUSION, MENTION b : something (as a sign or indication) that refers a reader or consulter to another source of information (as a book or passage) c : consultation of sources of information
4 : one referred to or consulted: as a : a person to whom inquiries as to character or ability can be made b : a statement of the qualifications of a person seeking employment or appointment given by someone familiar with the person c (1) : a source of information (as a book or passage) to which a reader or consulter is referred (2) : a work (as a dictionary or encyclopedia) containing useful facts or information d : DENOTATION, MEANING

Why am I telling you this? We introduced subroutines last week. We learned how to pass a list of arguments and return a list of arguments. What if we wanted to manipulate an existing variable? For instance what if you wanted to increment an existing array?

#!/usr/bin/perl
 
use warnings;
use strict;
 
# create an array of numbers
my @some_array = (5, 6, 7);
 
# print values in array
print "print before call to subroutine: ", join(", ", @some_array), "\n";
 
# call increment_array
increment_array(@some_array);
 
# print values in array
print "print after call to subroutine: ", join(", ", @some_array), "\n";
 
sub increment_array{
    # copy arguments from @_
    my @array_to_increment = @_;
 
    # loop through the elements of the array
    for(my $index=0; $index < scalar(@array_to_increment); $index++){
        # add 1 to each element
        $array_to_increment[$index]++;
    }
 
    # print values in array
    print "print in subroutine: ", join(", ", @array_to_increment), "\n";
 
}
 
__END__

This didn't work. I created an array. I print out the values and get "5, 6, 7". Then I call increment_array on the array. It should have copied the array and through a "for" loop, it should have incremented each value in the array. When I print out the array in the subroutine, it did increment it, but when I print it out in the main body of the script after the call to increment_array, it's unchanged. What went wrong?

The problem was that I copied the array. Manipulating the copy has not effect on the original array. So my increment is useless. How do I get the original array into the subroutine? I could treat it as a global variable but then my code isn't reusable and there is no point in writing a subroutine then. What I need to do is pass a reference to the original array.

Before explaining what a reference to an array is, let's look at another example. I want a subroutine to compare the values in two arrays. I'm not manipulating the arrays so maybe this will work by copying arrays.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# create two arrays of numbers
my @array1 = (4, 5, 6);
my @array2 = (4, 5, 6);
 
# print contents of arrays
print 'array1 is ', join(", ", @array1),
         ' and array2 is ', join(", ", @array2), "\n";
 
# print return from call of compare_arrays
print '0 means arrays are different and 1 means they are the same: ',
    compare_arrays(@array1, @array2), "\n";
 
#this subroutine returns 1 if arrays are identical and 0 if they are not
sub compare_arrays{
    # copy arguments from @_
    my (@local_array1, @local_array2) = @_;
 
     # return 0 if arrays are not same length
    if (scalar(@local_array1) != scalar(@local_array2)){
        return 0;
    }
    # if same length
    else{
        # for loop of array length
        for (my $index = 0; $index < scalar(@local_array1); $index++) {
            # return 0 if elements of same index do not have same value
            if ($local_array1[$index] ne $local_array2[$index]){
                return 0;
            }
        }
    }
    # otherwise return 1
    return 1;
}
 
__END__

This didn't work either. I created two arrays with the exact same values. I call the compare_arrays subroutine on the two arrays. It should have copied the two arrays from "@_". I then wrote perfectly good code that returns "0" if the arrays are not identical and "1" if they are. But compare_arrays returns "0", despite the arrays being equal. What happened?

If I add a print statement to the subroutine, this might help clarify what went wrong.

# copy arguments from @_
my (@local_array1, @local_array2) = @_;
 
# print contents of arrays
print 'local_array1 is ', join(", ", @local_array1),
    ' and local_array2 is ', join(", ", @local_array2), "\n";

This print statement was:

local_array1 is 4, 5, 6, 4, 5, 6 and local_array2 is

So I passed two arrays with three elements each but when I copied them out of "@_" into two arrays in the subroutine, the first local array got all six elements and the second one got no elements. This is not what I wanted. But this is how passing arguments to subroutines works. All the arguments get treated as a list. Arrays get expaned to make up the list. So when you get the arguments from "@_", all the previous organization is gone. The way to get around this is to pass a reference to each of the two arrays.

Backslash


A reference to a variable is where in the computer memory the variable is stored. Think of the Dewey Decimal System. The Encyclopedia Britanica might have a Dewey Decimal number of "973.05". "973.05" isn't the book. It just tells you where to find it. Similarly, references tell you and the computer, where variable is stored. It is not the variable itself.

Let's look at a simple example of a scalar variable and its reference.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# declare and define a scalar variable
my $some_number = 5;
 
# use backslash operator to print reference to scalar variable
print \$some_number, "\n";
 
__END__

So the scalar variable is "$some_number". I give it the value "5". Perfectly normal stuff. Then I am printing the variable with a backslash in front of it. The backslash operator turns a variale into the reference to the variable. When it prints this, you don't get "5". You get the string value of the reference to "$some_number". This looks something like "SCALAR(0x180b384)". Notice that its not complete gobblygook. It is "SCALAR(GOBBLYGOOK)". The variable "$some_number" is a scalar, so the reference string indicates this. You absolutely do not need to understand the GOBBLYGOOK and you should never try to manipulate the GOBBLYGOOK. It will accomplish nothing but confusion.

Each variable gets a unique reference. This is true even if the two variables have the same value. For example:

#!/usr/bin/perl
 
use warnings;
use strict;
 
# declare and define two scalar variables
my $some_number1 = 5;
my $some_number2 = 5;
 
# use backslash operator to print reference to scalar variables
print \$some_number1, "\n";
print \$some_number2, "\n";
 
__END__

Here you can see that these two variables have the same value of "5" but they have different references.

Every variable has a reference. You use the backslash to get the reference to any variable. For example:

#!/usr/bin/perl
 
use warnings;
use strict;
 
# declare and define
my $some_number = 5;
my @some_array = (4, 5, 6);
my %some_hash = ("nobel06" => "fire and mello");
 
# use backslash operator to print references
print \$some_number, "\n";
print \@some_array, "\n";
print \%some_hash, "\n";
 
__END__

As expected, they all have unique GOBBLYGOOK but each has the appropriate label for the type of variable it is.

You can do more than just print references. You can store them as a scalar variable.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# declare and define
my $some_number = 5;
my @some_array = (4, 5, 6);
my %some_hash = ("nobel06" => "fire and mello");
 
# use backslash operator to store references
my $number_reference = \$some_number;
my $array_reference = \@some_array;
my $hash_reference = \%some_hash;
 
# print references
print $number_reference, "\n";
print $array_reference, "\n";
print $hash_reference, "\n";
 
__END__

They will behave just like other scalar variables, but they have extra functionality and meaning. As I said before. Do not manipulate the value of the reference!!!

So are we now able to solve our problem of wanting to increment an array in a subroutine? Let's see. We know how to get the reference to the array. We could pass that as the only argument to the subroutine and see if it works.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# create an array of numbers
my @some_array = (5, 6, 7);
 
# print values in array
print "print before call to subroutine: ", join(", ", @some_array), "\n";
 
# call increment_array using a reference to some array as the argument
increment_array(\@some_array);
 
# print values in array
print "print after call to subroutine: ", join(", ", @some_array), "\n";
 
sub increment_array{
    # copy arguments from @_
    my @array_to_increment = @_;
 
    # loop through the elements of the array
    for(my $index=0; $index < scalar(@array_to_increment); $index++){
        # add 1 to each element
        $array_to_increment[$index]++;
    }
 
    # print values in array
    print "print in subroutine: ", join(", ", @array_to_increment), "\n";
 
}
 
__END__

Still didn't work. The array outside the subroutine was not affected by what happened in the subroutine. Also, when I printed out the values in the array after the increment, I got a single element with a seemingly random value.

Let's print out the values in "@array_to_increment" to see what we are copying out of "@_".

# copy arguments from @_
my @array_to_increment = @_;
 
# check on values in array before incrementing
print "print in subroutine: ", join(", ", @array_to_increment), "\n";

Ah, yes. OK. "@array_to_increment" contains a single element which looks like "ARRAY(0x180b330)". This is the reference to the original array. We want to get at the values in it. How do you do this? The answer is dereferencing. For now, let's change the code for this increment subroutine so that it is atleast capturing the reference properly:

# copy arguments from @_
my ($array_ref) = @_;

Dereferencing


OK, so how do you dereference and what is that?

The way you dereference is to put the "$", "@" or "%" symbol infront of the reference variable, including the "$" that is already there. For example:

#!/usr/bin/perl
 
use warnings;
use strict;
 
# declare and define
my $some_number = 5;
my @some_array = (4, 5, 6);
my %some_hash = ("nobel06" => "fire and mello");
 
# use backslash operator to store references
my $number_reference = \$some_number;
my $array_reference = \@some_array;
my $hash_reference = \%some_hash;
 
# print references
print "references:\n";
print $number_reference, "\n";
print $array_reference, "\n";
print $hash_reference, "\n";
 
# dereference references and do something with them
print "\nuse of dereferenced references:\n";
print $$number_reference + 1, "\n";
print scalar(@$array_reference), "\n";
print scalar(keys %$hash_reference), "\n";
 
__END__

From the previous examples, we declared and defined some variables, we stored references to them in variables and we printed the references. Now, using the "$$", "@$" and "%$" double-symbols, we are dereferencing the references and using them just as if they are normal variables. We are adding to the scalar, and getting the length of the array and the array of keys for the hash. These dereferenced references are completely interchangable with the original variables. There is no difference.

What about getting at individual elements in an array or hash? "@$" and "%$" aren't the right thing to use. You can use "$$" but this doesn't work well when dealing with multi-dimensional data structures, that we'll get to later this week. So the much better syntax is to use the "->" arrow. For example, adding to the above code:

# dereference array and hash references
# and access individual elements
print "\ndereferenced elements:\n";
print $array_reference->[0], "\n";
print $hash_reference->{"nobel06"}, "\n";

To get at the first element in the array that "$array_reference" points to, you use the variable name of the reference "$array_reference" and you add an arrow "->" and then the usual array element syntax "[0]". "$array_reference->" is the dereferenced reference and then you just use the normal array element syntax. This will always work, regardless of how complicated your data structure is. Similarly, for hash elements, you use the array and then the usual hash element syntax.

Finally, just to emphasize that a dereferenced reference really is the same as the variable that the reference points to, adding this bit of code changes the values of the variables using the dereferenced references:

# can change arrays and hashes through
# dereferenced references
$array_reference->[0] = 3;
$hash_reference->{nobel06} = "andy and craig";
 
# print array and hash
print "\naltered array and hash:\n";
print join(", ", @some_array), "\n";
print $some_hash{"nobel06"}, "\n";

Now the array has the values "3, 5, 6" and the hash value for "nobel06" is the first names of the winners.

using references and dereferencing with subroutines


Now we have the tools to properly do the increment and compare arrays subroutines.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# create an array of numbers
my @some_array = (5, 6, 7);
 
# print values in array
print "print before call to subroutine: ", join(", ", @some_array), "\n";
 
# call increment_array using a reference to some array as the argument
increment_array(\@some_array);
 
# print values in array
print "print after call to subroutine: ", join(", ", @some_array), "\n";
 
sub increment_array{
    # copy arguments from @_
    my ($array_ref) = @_;
 
    # loop through the elements of the array
    for(my $index=0; $index < scalar(@$array_ref); $index++){
        # add 1 to each element
        $array_ref->[$index]++;
    }
 
    # print values in array
    print "print in subroutine: ", join(", ", @$array_ref), "\n";
 
}
 
__END__

When we call the "increment_array" subroutine, we pass just the reference to the array as the only argument. We then copy the reference to the array from "@_" in the subroutine. To increment, we use the same code as before but we dereference the array when we need to use it. The code works just how we want it to.

#!/usr/bin/perl
 
use warnings;
use strict;
 
# create two arrays of numbers
my @array1 = (4, 5, 6);
my @array2 = (4, 5, 6);
 
# print contents of arrays
print 'array1 is ', join(", ", @array1),
         ' and array2 is ', join(", ", @array2), "\n";
 
# print return from call of compare_arrays
print '0 means arrays are different and 1 means they are the same: ',
    compare_arrays(\@array1, \@array2), "\n";
 
#this subroutine returns 1 if arrays are identical and 0 if they are not
sub compare_arrays{
    # copy arguments from @_
    my ($array_ref1, $array_ref2) = @_;
 
    # return 0 if arrays are not same length
    if (scalar(@$array_ref1) != scalar(@$array_ref2)){
        return 0;
    }
    # if same length
    else{
        # for loop of array length
        for (my $index = 0; $index < scalar(@$array_ref1); $index++) {
            # return 0 if elements of same index do not have same value
            if ($array_ref1->[$index] ne $array_ref2->[$index]){
                return 0;
            }
        }
    }
    # otherwise return 1
    return 1;
}
 
__END__

When we call the "compare_arrays" subroutine, we pass two arguments, references to the two arrays. We then copy these two references from "@_" in the subroutine. Like the increment example, we dereference the two arrays whenever we use them. Otherwise the code is exactly how you would expect. This works and the two arrays get evaluated as identical. Success!

Naming


One last thing. Please, please, please name reference with "ref" somewhere in the name of the variable. It'll make you life so much better.

Exercises


Problem1: hash_byreference.pl

  • Declare the hash below.
my %address_hash = (
            "name" => "Dan Pollard",
            "street" => "Building 84, 1 Cyclotron Road",
            "city" => "Berkeley",
            "state" => "CA",
            "zip" => "94720"
            );

  • Create a reference to this hash
  • Print out the value for "name" key using the reference
  • Write a loop to iterate through the hash, using the reference

Problem2: hash_subroutines.pl

1:sub print_hash{}
  • The subroutine should take a reference to a hash as an argument.
  • It will print out all key-value pairs of the referenced hash.

2:sub clear_hash{}
  • Pass a hash to this subroutine by reference.
  • Iterate through the hash and clear all values.

3:sub fill_a_hash{}
  • The subroutine should take a reference to a hash and a reference to an array as the arguments.
  • The subroutine should fill the hash will the values of the array as the keys of the hash and the lengths of the keys as the values of the hash.

If the array, who's reference is passed to the subroutine, is:
my @some_array = ("Cal", "Stanford", "UCSF");

Then the hash should be:

(
    "Cal" => 3,
    "Stanford" => 8,
    "UCSF" => 4
);

Problem3: check_hash_for_keys.pl

sub check_hash_for_keys{}
  • Input arguments to the subroutine are an array reference and a hash reference
  • For each element of the input array, check whether it is a key in the input hash
  • Subroutine should return a list of all the elements missing from the hash.

  • Test on the following two input sets:

Input1
my %address_hash = (
            "name" => "Dan Pollard",
            "street" => "Building 84, 1 Cyclotron Road",
            "city" => "Berkeley",
            "state" => "CA",
            "zip" => "94720"
            );
 
my @keys_array = qw(name street zip);
Should return an empty array.

Input2
my %address_hash = (
            "name" => "Dan Pollard",
            "street" => "Building 84, 1 Cyclotron Road",
            "city" => "Berkeley",
            "state" => "CA",
            "zip" => "94720"
            );
 
my @keys_array = qw(name street planet zip country);
Should return the list ("planet", "country")





Project


Go back to working on the problems from 6.1.