Making life easier


Finding Answers




All the perldoc text is available from within the terminal too.

For specific functions, use "perldoc -f functioname":
perldoc -f join

You can also use regular expressions to search the FAQ:
perldoc -q "string"
perldoc -q "count.*string"

  • Combined Index of O'Reilly Books
Since Berkeley has a legal subscription to the books, I don't feel bad disclosing:
http://www.unix.org.ua/orelly/perl/index.htm
(O'Reilly periodically shuts down these sites, but new ones pop up: google for "perlshelf" or "perl cookbook")

In fact, that page looks like a goner already. See an alternative at:

https://ascent.student.utwente.nl/~ascent/perl/book/perlshelf/



More on good coding practices and debugging

  • names (variables), names (subroutines), names (programs)
  • comments/descriptions
  • subroutines

Compare the ease of understanding what a program does for the two following code samples. Both perform the exact same task with the exact same output.

#!/usr/bin/perl
# Author        :
# Date          : Fri Aug 10 23:40:11 PDT 2007
 
 
use strict;
use warnings;
 
my $input1 = $ARGV[0];
my $input2  = $ARGV[1];
 
unless ($input1 and $input2){
    die "no input\n";
}
 
my $string = '';
 
open( my $fh, "<", $input1 ) or die "failed to open file $input1\n ";
 
while ( my $file_line = <$fh> ) {
    chomp $file_line;
    next if $file_line =~ m/^>/;
    $string .= $file_line;
}
close($fh);
 
my @array =();
 
for (my $i=0; $i<length($string);$i+=$input2){
    my $j = substr ($string,$i, $input2);
    push (@array, $j);
}
 
for (my $i =scalar(@array); $i--; ) {
    my $j = int rand ($i+1);
    next if $i == $j;
    my $temp = $array[$i];
    $array[$i] = $array[$j];
    $array[$j] = $temp;
}
 
$string = join('',@array);
 
print ">sequence\n",$string,"\n";
 
__END__

Versus:
#!/usr/bin/perl
# Author        :
# Date          : Fri Aug 10 23:40:11 PDT 2007
# Description   : This script randomizes fasta DNA sequence in user-specified blocks
 
#Details: Script accepts a DNA fasta file and the block size for randomizing.
#          Output is a single DNA sequence, the same size as the whole input fasta, with blocks of sequence
#          randomly shuffled.
#Example: if input DNA has "ACGGTC" and block-size is 2, the output might be "ACTCGG", "GGACTC", "GGTCAC",
#          "TCACGG", or "TCGGAC".
 
use strict;
use warnings;
 
my $fasta_file = $ARGV[0];
my $block_size  = $ARGV[1];
 
unless ($fasta_file and $block_size){
    die "Expect two inputs\n";
}
 
 
my $dna_string = extract_dna_from_fasta($fasta_file);
 
$dna_string = randomize_dna_by_blocks($dna_string, $block_size);
 
print ">randomized dna in blocks of $block_size from fasta_file\n",$dna_string,"\n";
 
 
##############################################################
#This is the end of the main routine.  Only subroutines below
##############################################################
 
#this subroutine returns a single string containing all sequence from a fasta file
sub extract_dna_from_fasta {
    my $file = $_[0];
    my $string_from_fasta = '';
 
    open( my $fh, "<", $file ) or die "failed to open file $file\n ";
 
    while ( my $file_line = <$fh> ) {
        chomp $file_line;
        next if $file_line =~ m/^>/;
        $string_from_fasta .= $file_line;
    }
    close($fh);
 
    return ($string_from_fasta);
}
sub randomize_dna_by_blocks {
    my $dna_string = $_[0];
    my $block_size = $_[1];
 
    my @split_dnastring_array =();
    #insert every k-mer from the sequence as an element into an array
    for (my $i=0; $i<length($dna_string);$i+=$block_size){
        my $next_block = substr ($dna_string,$i, $block_size);
        push (@split_dnastring_array, $next_block);
    }
 
    #now shuffle randomly the blocks within the array
    @split_dnastring_array= array_shuffle( @split_dnastring_array );
 
    return join('',@split_dnastring_array);
}
 
 
#this is the "fisher_yates_shuffle" from the Cook Book
sub array_shuffle {
    my @array = @_;
 
    for (my $i =scalar(@array); $i--; ) {
        my $j = int rand ($i+1);
        next if $i == $j;
        my $temp = $array[$i];
        $array[$i] = $array[$j];
        $array[$j] = $temp;
    }
    return @array;
}
 
 
__END__

  • diff
  • test "system" calls
    Before you run a script with a system call, do a print statement instead and try to first run the command with all the arguments from the terminal.
my $base_dir = '/home/student/class';
my $primer_file = 'primers.sts';
my $fasta_file  = 'test.fa';
my $result_dir = "$base_dir/6. 0/";
 
my $system_call = "e-PCR $primer_file $fasta_file > $result_dir/epcr_results.txt";
 
print "$system_call\n";
#system ($system_call) ;

  • or die "message" (identify the script line/filename that caused the problem)
  • do not hard-code numbers into the body of the code; declare them as variables at the start
my $blast_cutoff = 0.001;
 
my @good_hits = return_good_hits ("blast_file", $blast_cutoff);
...

Debugger

Very useful for catching errors and understanding what the code does. Allows step-by-step execution of the code.
To run your program inside the debugger, use the "perl -d" option:
perl -d some_perl_script.pl argument1 argument2...
Once inside the debugger, "h" for quick reminder of how to use it, and "q" to quit.

http://perldoc.perl.org/perldebug.html
http://perldoc.perl.org/perldebtut.html