Title - Multidemensional Data Structures 1 - Dereferencing and (Arrays of Arrays)/Matrices



FOREACH LOOP (review)

#!/usr/bin/perl
# Author        :
# Date          : Mon Aug 13 15:13:36 UTC 2007
# Description   :
 
use strict;
use warnings;
 
my @instructors = ("jaime", "lenny", "dan");
 
foreach my $person (@instructors){
    print "$person is an instructor in a foreach loop\n";
}
 
print "\n\n\n";
 
for (my $i=0; $i < scalar(@instructors); $i++){
    print "$instructors[$i] is an instructor in the $i-th run of a for loop\n";
}
print "\n\n\n";
 
my $j = 0;
while ( $j < scalar(@instructors)){
    print "$instructors[$j] is in a while loop\n";
    $j++;
}
 
__END__

By now we have descended into references in a big way! One of the hardest things for me, and I'm sure for the TA's and other instructors, has been the fact that we haven't been able to help you guys use complex data structures in your work. Just as we can have one loop and nest another loop inside it, we can take one data structure and nest another data structure inside of it. To do this we need to BE very very careful about using references!

-complex1.pl

#!/usr/bin/perl
# Author        :
# Date          : Thu Jan 11 09:11:44 UTC 2007
# Description   :
 
use strict;
use warnings;
use Data::Dump qw (dump);
 
 
my @instructors = ("Jaime", "Lenny", "Dan");
my $ref_instructors = \@instructors;
 
# Assign an array references to an array.
my @AoA = (
         $ref_instructors,
);
 
print "@AoA is actually contained in the array\n";
 
print "$AoA[0]->[0] is the 0th array's 0th element\n\n";
 
print "use dump to help you look at complex data structures\n";
 
dump (\@AoA);
 
__END__

In the first example here, I have pushed one element into the array caled @AoA (array of arrays)! That element was an array. At the end of the script, I've printed out the contents of the array using dump. But I've also done something different with the command $AoA[0][0]. I'm accesssing @AoA's zeroth element, and then dereferencing that element with the -> arrow and accessing the zeroth element of that (dereferenced) element.

-complex1a.pl

#!/usr/bin/perl
# Author :
# Date : Thu Jan 11 09:11:44 UTC 2007
# Description :
 
use strict;
use warnings;
use Data::Dump qw (dump);
 
 
my @instructors = ("Jaime", "Lenny", "Dan");
my $ref_instructors = \@instructors;
 
my @assistants = ("Rich", "Rose", "Jody");
my $ref_assistants = \@assistants;
 
# Assign two array references to an array.
my @AoA = (
 $ref_instructors, $ref_assistants
);
 
print "@AoA\n";
print "$AoA[0]->[0]\n"; # returns the 0th array's 0th element
print "$AoA[1][1]\n"; # returns the 1st array's 1st element
 
dump (\@AoA);
 
__END__

In this example I've changed two things. First I've passed a second array reference to the array. So now I'm truly building up my array of arrays. Second, I've gotten rid of the arrow between the two brackets when I access Dan. This is because perl automatically dereferences the second of two adjacent brackets.


-complex2.pl

#!/usr/bin/perl
# Author        :
# Date          : Thu Jan 11 09:11:44 UTC 2007
# Description   :
 
use strict;
use warnings;
use Data::Dump qw (dump);
 
 
my @instructors = ("Jaime", "Lenny", "Dan");
my $ref_instructors = \@instructors;
 
my @assistants = ("Rich", "Rose", "Jody");
 
# Assign two array references to an array.
my @AoA = (
         $ref_instructors, [@assistants]
);
 
print "@AoA\n";
print "$AoA[1]->[1]\n"; # returns the 1st array's 1st element
 
dump (\@AoA);
 
__END__

In this example, rather than creating a reference to assistants before passing it to my array of arrays, I'm creating the reference with brackets within the complex data structure! You'll notice that the first statment still prints out two array references. This trick is really useful when you start populating your own arrays of arrays. I'm now going to show you how you might want to populate your array of arrays.

-complex2a.pl

#!/usr/bin/perl
# Author :
# Date : Thu Jan 11 09:11:44 UTC 2007
# Description :
 
use strict;
use warnings;
use Data::Dump qw (dump);
 
 
my @instructors = ("Jaime", "Lenny", "Dan");
my $ref_instructors = \@instructors;
 
my @assistants = ("Rich", "Rose", "Jody");
my $ref_assistants = \@assistants;
 
 
my @AoA = ();
 
print "@AoA\n";
 
push @AoA, $ref_instructors;
 
print "@AoA\n";
 
push @AoA, $ref_assistants;
 
print "@AoA\n";
 
print "$AoA[0]->[1]\n";
print "$AoA[1]->[1]\n";
 
dump (\@AoA);
 
__END__

You've seen push before, in arrays. But its worthwhile to spend a couple seconds here to remind you that it will add whatever is after the comma to the array you specify. Using push is a powerful way to build up your array of arrays. And of course you can use push to add arrays where you create the reference on the fly!

-complex2b.pl

#!/usr/bin/perl
# Author :
# Date : Thu Jan 11 09:11:44 UTC 2007
# Description :
 
use strict;
use warnings;
use Data::Dump qw (dump);
 
 
my @instructors = ("Jaime", "Lenny", "Dan");
 
my @assistants = ("Rich", "Rose", "Jody");
 
 
my @AoA = ();
 
print "@AoA\n";
 
push @AoA, [@instructors];
 
print "@AoA\n";
 
push @AoA, [@assistants];
 
print "@AoA\n";
 
print "$AoA[0]->[1]\n";
print "$AoA[1]->[1]\n";
 
dump (\@AoA);
 
__END__

Notice that creating the reference on the fly allows you remember the state of the structure when you passed it to the array of arrays.

-complex2c.pl
#!/usr/bin/perl
# Author :
# Date : Thu Jan 11 09:11:44 UTC 2007
# Description :
 
use strict;
use warnings;
use Data::Dump qw (dump);
 
 
my @instructors = ("Jaime", "Lenny", "Dan");
my @assistants = ("Rich", "Rose", "Jody");
 
my @AoA = ();
 
print "@AoA\n";
 
push @AoA, [@instructors];
 
push @instructors, (shift @assistants);
 
push @AoA, [@instructors];
 
 
print "@AoA\n";
 
 
dump (\@AoA);
 
__END__

These so called "anonymous data structures" are going to become more and more important to you as you move along. So let's really understand what's going on here. I have an array of arrays, and each time I push @instructors onto it, its creating a reference to the @instructors array in its current state. This stands in contrast to this program

-complex2d.pl

#!/usr/bin/perl
# Author :
# Date : Thu Jan 11 09:11:44 UTC 2007
# Description :
 
use strict;
use warnings;
use Data::Dump qw (dump);
 
 
my @instructors = ("Jaime", "Lenny", "Dan");
my @assistants = ("Rich", "Rose", "Jody");
my $ref_instructors = \@instructors;
 
my @AoA = ();
 
print "@AoA\n";
push @AoA, $ref_instructors;
 
push @AoA, [@instructors];
 
push @instructors, (shift @assistants);
 
push @AoA, [@instructors];
 
 
print "@AoA\n";
 
 
dump (\@AoA);
 
__END__

You can also just add an element to one of the arrays within the array or access any single element.

-complex3.pl

#!/usr/bin/perl
# Author :
# Date : Thu Jan 11 09:11:44 UTC 2007
# Description :
 
use strict;
use warnings;
use Data::Dump qw (dump);
 
 
my @instructors = ("Jaime", "Lenny", "Dan");
 
my @assistants = ("Rich", "Rose", "Jody");
 
 
my @AoA = ();
 
print "@AoA\n";
 
push @AoA, [@instructors];
 
print "@AoA\n";
 
push @AoA, [@assistants];
 
print "@AoA\n";
 
$AoA[1][3] = "Lil";
 
print "$AoA[0]->[1]\n";
print "$AoA[1]->[3]\n";
 
dump (\@AoA);
 
__END__

How about if you want to access one of the nested arrays. Then the syntax gets a bit messier. Here we use the Braces to dereference the structure to an array. We need to do that if we want to treat a part of the complex data structure as an array. Otherwise we are normally treating it as a scalar string dereferenced.

-complex3a.pl

#!/usr/bin/perl
# Author :
# Date : Thu Jan 11 09:11:44 UTC 2007
# Description :
 
use strict;
use warnings;
use Data::Dump qw (dump);
 
 
my @instructors = ("Jaime", "Lenny", "Dan");
 
my @assistants = ("Rich", "Rose", "Jody");
 
 
my @AoA = ();
 
print "@AoA\n";
 
push @AoA, [@instructors];
 
print "@AoA\n";
 
push @AoA, [@assistants];
 
print "@AoA\n";
 
push @{$AoA[1]}, "Lil";
 
print "$AoA[0]->[1]\n";
print "$AoA[1]->[3]\n";
 
print "@{$AoA[1]}\n";
 
dump (\@AoA);
 
__END__

Ok. So now you know how to create a nested data structure. You know how to add elements or entire arrays to it. You know how to make those elements "anonymous" so that they retain the values that you passed them at the time not the values at some other point in time!

Let's now go over some syntax for looping through an array of arrays. First we could use the foreach syntax to get all of the nested arrays:

-complex4.pl

#!/usr/bin/perl
# Author :
# Date : Thu Jan 11 09:11:44 UTC 2007
# Description :
 
use strict;
use warnings;
use Data::Dump qw (dump);
 
 
my @instructors = ("Jaime", "Lenny", "Dan");
my $ref_instructors = \@instructors;
 
my @assistants = ("Rich", "Rose", "Jody");
my $ref_assistants = \@assistants;
 
# Assign two array references to an array.
my @AoA = (
 $ref_instructors, $ref_assistants
);
 
print "@AoA\n";
 
foreach my $row (@AoA){
 print "@$row\n";
}
 
dump (\@AoA);
 
__END__

And we could nest a second loop in there to get each individual element

#!/usr/bin/perl
# Author        :
# Date          : Thu Jan 11 09:11:44 UTC 2007
# Description   :
 
use strict;
use warnings;
use Data::Dump qw (dump);
 
my @instructors = ( "Jaime", "Lenny", "Dan" );
my $ref_instructors = \@instructors;
 
my @assistants = ( "Rich", "Rose", "Jody" );
my $ref_assistants = \@assistants;
 
# Assign two array references to an array.
my @AoA = ( $ref_instructors, $ref_assistants );
 
print "@AoA\n";
 
foreach my $row (@AoA) {
    print "a new row\n";
    foreach my $column_position (@$row) {
        print "$column_position\n";
    }
}
 
dump( \@AoA );
 
__END__

But if we want to extract something about the specific index of the element, we want to use a for loop.

-complex4b.pl
#!/usr/bin/perl
# Author        :
# Date          : Thu Jan 11 09:11:44 UTC 2007
# Description   :
 
use strict;
use warnings;
use Data::Dump qw (dump);
 
my @instructors = ( "Jaime", "Lenny", "Dan" );
my $ref_instructors = \@instructors;
 
my @assistants = ( "Rich","Rose", "Jody" );
my $ref_assistants = \@assistants;
 
# Assign two array references to an array.
my @AoA = ( $ref_instructors, $ref_assistants );
 
print "@AoA\n";
 
for (my $row=0;$row<scalar(@AoA);$row++) {
    print "row $row is: @{$AoA[$row]}\n";
 
}
 
dump( \@AoA );
 
__END__

And to get all elements...
-complex4c.pl

#!/usr/bin/perl
# Author        :
# Date          : Thu Jan 11 09:11:44 UTC 2007
# Description   :
 
use strict;
use warnings;
use Data::Dump qw (dump);
 
my @instructors = ( "Jaime", "Lenny", "Dan" );
my $ref_instructors = \@instructors;
 
my @assistants = ( "Rich", "Rose", "Jody" );
my $ref_assistants = \@assistants;
 
# Assign two array references to an array.
my @AoA = ( $ref_instructors, $ref_assistants );
 
print "@AoA\n";
 
for ( my $row = 0; $row < scalar(@AoA); $row++ ) {
    for ( my $column = 0; $column < scalar( @{ $AoA[$row] } ); $column++ ) {
        print "row $row column $column is $AoA[$row][$column]\n";
    }
}
 
 
 
dump( \@AoA );
 
__END__

We can use this to populate a two dimensional array of arrays.

-complex5.pl

#!/usr/bin/perl
# Author        :
# Date          : Thu Jan 11 11:29:58 UTC 2007
# Description   :
 
use strict;
use warnings;
use Data::Dump qw(dump);
 
my @AoA;
 
for (my $i=0; $i< 9; $i++){
    for (my $j=0; $j<9; $j++){
        $AoA[$i][$j] = $i*$j;
    }
}
 
dump (\@AoA);
 
__END__

Or... if you are ready to have your mind blown... even higher dimensions

-complex5a.pl

#!/usr/bin/perl
# Author        :
# Date          : Thu Jan 11 11:29:58 UTC 2007
# Description   :
 
use strict;
use warnings;
use Data::Dump qw(dump);
 
my @AoA;
 
for ( my $i = 0; $i < 9; $i++ ) {
    for ( my $j = 0; $j < 9; $j++ ) {
        for ( my $k = 0; $k < 9; $k++ ) {
            $AoA[$i][$j][$k] = $i * $j * $k;
        }
    }
}
 
dump( \@AoA );
 
__END__

So how can we use these types of data structures... How about parsing?

#!/usr/bin/perl
# Author :
# Date : Sat Jan 6 14:35:12 UTC 2007
# Description :
 
use strict;
use warnings;
 
#call this program in the diectory with the blast files
#give this program a command line argument like "perl blast_parser.pl *.bla"
 
my @results_files = @ARGV;
 
my @queries;
my @subjects;
my @evalues;
 
foreach my $file_name (@results_files) {
 open my $FILEHANDLE, "<", $file_name or die "you suck!";
 while ( my $file_line = <$FILEHANDLE> ) {
 chomp $file_line;
 if ( $file_line =~ m/^>/ ) {
 }
 else {
 
 #parse the line into an array
 my @query_subject_evalue = split /,/, $file_line;
 
 #push the query to the query array
 push( @queries, $query_subject_evalue[0] );
 
 #push the subject to a subject array
 push( @subjects, $query_subject_evalue[1] );
 
 #push the evalue to an evalue array
 push( @evalues, $query_subject_evalue[2] );
 }
 }
 
 #DON'T FORGET TO CLOSE THE FILE HANDLE!
 close $FILEHANDLE;
 
}

But now rather than passing off each thing to an array. I'm going to collect the results lines in an array of arrays.

#!/usr/bin/perl
# Author        :
# Date          : Sat Jan  6 14:35:12 UTC 2007
# Description   :
 
use strict;
use warnings;
use Data::Dump qw(dump);
 
#call this program in the diectory with the blast files
#give this program a command line argument like "perl blast_parser.pl *.bla"
 
my @results_files = @ARGV;
 
my @array_of_results;
 
foreach my $file_name (@results_files) {
    open my $FILEHANDLE, "<", $file_name or die "you suck!";
    while ( my $file_line = <$FILEHANDLE> ) {
        chomp $file_line;
        if ( $file_line =~ m/^>/ ) {
        }
        else {
 
            #parse the line into an array
            my @query_subject_evalue = split /,/, $file_line;
        push @array_of_results, [@query_subject_evalue]
 
        }
    }
 
    #DON'T FORGET TO CLOSE THE FILE HANDLE!
    close $FILEHANDLE;
 
}
 
dump (\@array_of_results);

And If I want to do something on them

#!/usr/bin/perl
# Author        :
# Date          : Sat Jan  6 14:35:12 UTC 2007
# Description   :
 
use strict;
use warnings;
use Data::Dump qw(dump);
 
#call this program in the diectory with the blast files
#give this program a command line argument like "perl blast_parser.pl *.bla"
 
my @results_files = @ARGV;
 
my @array_of_results;
 
foreach my $file_name (@results_files) {
    open my $FILEHANDLE, "<", $file_name or die "you suck!";
    while ( my $file_line = <$FILEHANDLE> ) {
        chomp $file_line;
        if ( $file_line =~ m/^>/ ) {
        }
        else {
 
            #parse the line into an array
            my @query_subject_evalue = split /,/, $file_line;
            push @array_of_results, [@query_subject_evalue]
 
        }
    }
 
    #DON'T FORGET TO CLOSE THE FILE HANDLE!
    close $FILEHANDLE;
 
}
 
#go through the array of arrays and print out sig values
for ( my $result = 0; $result < scalar(@array_of_results); $result++ ) {
    if ( $array_of_results[$result][2] < 0.1 ) {
        print "I found a significant result: @{$array_of_results[$result]}\n";
    }
}
 
dump( \@array_of_results );

Exercises


(seating_chart.pl)
Make a seating chart for the class as an array of arrays. Initialize each row as a seperate array with the names of the people in them. Then push all the rows into one array of arrays.

Print out a nicely formated chart that doesn't use the dump command.
For example

FRONT
row 1: Lenny Jaime Dan
row 2: Jody Rich Rose
row 3: Billy Ksenia
BACK

Make sure that you introduce yourself to people that you don't know so that you can make sure your chart is correct.

(seating_chart2.pl)
Change the code so that a seating shift is encoded in a subroutine. Your subroutine should take the last person from each row and put them to the front of the next row. For the last row, they should be moved to the beginning of the first row. Then add a subroutine that takes in the chart and returns who is seated in the second seat from the end for each row. Test it on the above chart and a chart of our actual classroom within the same program by calling the subroutines twice not by rerunning the program.

FRONT
row 1: Lenny Jaime Dan
row 2: Jody Rich Rose
row 3: Billy Ksenia
BACK
 
1st run:
FRONT
row 1: Ksenia Lenny Jaime
row 2: Dan Jody Rich
row 3: Rose Billy
BACK
 
Lenny, Jody, Rose
 
2nd run:
FRONT
row 1: Billy Ksenia Lenny
row 2: Jaime Dan Jody
row 3: Rich Rose
BACK
 
Ksenia, Dan, Rich
 
 
 

(gff_to_matrix.pl)
Use your yeast GFF file and write a script that stores in memory all of the genes lines parsed into in an array of arrays.

Check your data structure by accessing
$array_of_genes[0][0] should give the chromosome for the first gene (chrI)
$array_of_genes[3][3] should give the start positions for the fourth gene (2480)
$array_of_genes[-1][4] sould give the end position of the last gene (6198)

(gff_to_matrix2.pl)

Modify your program so that it prints the strand for all genes over 1000 bases in length by looping through the array of arrays.

If you are done this... go back to the project work!

Extra Problems on arrays of arrays


gff_to_matrix3.pl

Modify your program so that each chromosome's results are stored in a seperate array (all result lines, not just genes). Thus your data structure should be 3 levels deep:
$array_of_genes[$chromosome_number][$result_line][$fields_in_line]. Now write a subroutine that takes in your data structure, a chromosome, a feature type of interest (gene, cds, etc), and a minimum length and returns an data structure that is an array of arrays $array_of_results[$fields_in_line].

gff_to_matrix4.pl

Modify gff_to_matrix2.pl so that it opens a second gff file . Loads it into a separate array of arrays. Loop through both (array of arrays)s to see if there are any changes in any of the fields.

Jaime