A brief Perl primer

Overview

Roles of Perl
Styles of Perl Programming
Perl data types and operations on them
Subroutines, scopes, and other flow control
OO in Perl
Regexes
Cool Perl tricks
My recommended style
Resources
Program overview
Weaknesses of Perl

Roles of Perl

Perl is useful for a number of things. Thanks to a strong developer community, there is a central repository called CPAN through which software modules for tasks from fuzzy logic to talking to Oracle are distributed. A rich (if cryptic) pattern matching/transformation/extraction sublanguage for regular expressions (regexes) provides powerful text manipulation capabilities. Perl datatypes are free-form, automatically sized to meet needs. Perl's syntax is very loose, allowing easy migration from many different languages. Integration with other languages is fairly easy, using low-level tools such as libperl or XS, or using modules such as Inline::Java . Perl is portable -- Pure Perl code is generally portable across Unices, Windows, MacOS, and other platforms. Memory management in Perl is very easy -- the Perl garbage collector handles most tasks for you, only needing help to break circular references. Perl is also semi-interpreted, giving it better performance than purely interpreted languages.

Styles of Perl Programming

#!/usr/bin/perl
print "Hello, world!\n";

Perl started life inspired by C, shell, various Unix utilities such as sed and awk, and some other languages, and has continued to borrow useful/beautiful constructs from others throughout its evolution. It has been said that Perl is a language in which people can write their own language, and this is equally valid as criticism and praise. It is very possible to use Perl primarily as a better shell script language than shell -- shell programming is subject to the whims of various Unix vendors, not just in the details of the shell, but also in how sed, awk, and the like work. Portable shell programming is very difficult. Many programmers have reported being happy moving from languages such as LISP to Perl. I don't know LISP well enough to comment on this :) Due to the popularity of C, C++, and Java, many people's Perl resemble those languages structures and flow. Perl is, in my eyes, a better C than C, and similarly for those other languages, and you can see a lot of C in my Perl. There are, however, advantages to picking up Perl idioms over time. Unlike a lot of these other languages, you don't need to know much Perl to do something useful with it. Perl is also capable of Object-Oriented Programming, with most CPAN modules offering only object interfaces.

Perl Data Types

Perl has 5 essential data types.

Data types

Scalar ($) -- Scalars hold single values. They hold numbers, strings, objects, references, and any other single elements you might want. Perl isn't picky about what you put in a scalar, and will determine by context and content what you want when you use a scalar. Scalars are marked by the '$' prefix.
```
	$a = "Cat";
	$a = $a . "'s Whiskers";
	print $a . "\n";
```
returns Cat's Whiskers. If we then do:
```
	$a += 2;
	print "$a\n";
```
we get 2. We then might do
```
	$a .= " is it\n";
	print $a;
```
We'd get 2 is it. When Perl is asked to interpret a scalar as a number that was last used as a string, it looks for something that looks like a number at the beginning, numerifies it, and uses it. When asked to use a number as a string, it simply stringifies it. Perl strings are sized automatically, can be assigned into, and can be manipulated in powerful ways using substr and regular expressions.
Arrays (@) -- Arrays hold ordered, indexed lists of scalars. To refer to a list as a whole, the '@' prefix is used. To refer to a scalar in a list, '$' is used with the index. It is also possible to refer to a slice of a list, assigning into or reading out of those parts, with a continuous or explicit range. A list is an immediate form of an array, specified as values in parentheses.
```
	@cats = ("Wally", "Nemesis", "Sammy", "Oliver");
	foreach $cat (@cats)
		{
		print "We have a cat called $cat\n";
		}
```
is the same thing as
```
	@cats = ("Wally", "Nemesis", "Sammy", "Oliver");
	for($i=0; $i<= $#cats; $i++)
		{
		print("We have a cat called " . $cats[$i] . "\n");
		}
```
Assigning lists is handled intelligently:
```
	($foo,undef,$bar) = @ARGV[0..$#ARGV];
	print $foo . "\n";
```
This prints out the first argument passed to the script. Assigning to undef in a list is useful to discard parts of the source list. Note that it is not a problem if @ARGV is bigger than the target list -- extra elements are discarded. Notice also that if @ARGV is shorter, the assignment will still happen, but the unmatched elements will get the value undef (more on that later). Finally, note that the @ARGV array (familiar to C programmers) is an array in Perl (@ARGV's element 0 is C's argv[1].) The last index of an array @foo is available as $#foo, whereas the size of the same array is available as scalar(@foo) or otherwise attempting to retrieve @foo in scalar context. It is possible to push, pop, shift, and unshift an array, each an operation that places and retrieves new values at the start and end of an array, respectively. A destructive way to do the above examples would be
```
	@cats = ("Wally", "Nemesis", "Sammy", "Oliver");
	while(my $cat = pop(@cats) )
		{
		printf("We have a cat called %s\n", $cat);
		}
```
Hash/Associative Array (%) -- Hashes are another kind of list that, instead of using numeric indices, uses scalar indices (called keys). They can be used for a number of purposes, including code-organizational purposes. Like arrays, they use their own symbol when referring to the entire hash, but individual members are accessed using the scalar symbol $ with the key.
```
	$a = "Haha\n";
	my %prefs;
	$prefs{directory} = $ENV{HOME} . "./myprogram";
	$prefs{username} = $ENV{USER};
	$prefs{foo} = $a;
```
Notice that the environment variables are available as the hash %ENV in Perl. It is also possible to retrieve the keys of a hash using the keys function, which returns them in list context. It is possible to delete keys and their associated values using the delete function.
Filehandles (no symbol) -- Filehandles in Perl have no symbol, and are global (unfortunately). There are modules that get around this, not described here. Filehandles can be opened with open, closed with close, and read with readline. There is a special syntax that can make readline implicit within a certain type of while loop.
```
	open(RESOLV, "/etc/resolv.conf");
	while($line = <RESOLV>)
		{
		print $line;
		}
	close(RESOLV);
```
This code prints out the contents of /etc/resolv.conf (not very useful unless you're on a Unix system). Notice that this is equiv to the following examples:
```
	$myfile = "/etc/resolv.conf";
	open(RESOLV, $myfile) || die "Could not open $myfile: $!\n";
	while($line = readline(RESOLV))
		{
		chomp($line); # Remove newline, if present
		print $line . "\n";
		}
```
and this too (idiomatic Perl, I'll explain in a second what this is doing)
```
	open(RESOLV, "/etc/resolv.conf") || die "Could not open /etc/resolv.conf: $!\n";
	while(<RESOLV>)
		{
		print;
		}
```
In the last 2 examples, the global variable $! is accessed if the open failed. This variable contains a string indicating why the last function failed, and might contain a string such as "permission denied" or "no such directory". The last example uses a special perl scalar called the accumulator, which will be discussed momentarily. Notice that the angle brackets around the filehandle acts to read a line from it, and when that fails, the while loop terminates. To write to a file, open it with a > marker at the start of its name. You may then direct print statements to it by listing the filehandle before the print content (no comma):
```
open(MYFILE, ">" . $ENV{HOME} . "/myfile.txt") || die "Can't do example: $!\n";
print MYFILE "Hello, world!\n";
close(MYFILE);
```
Code (&) -- The equivilent of C's function pointers is possible, as are anonymous functions, local functions, piecemail construction of functions through regexes and eval, and various other black magic. If you decide you need to use any of this, I suggest you borrow or purchase O'Reilly's Programming Perl or Perl Cookbook. Alternatively, you can come find me, or dig out one of the tutorials on the web, or perhaps you'd care to read the Perl manpages/docs.

The accumulator

Perl has a special scalar that's accessed in many functions if no argument is provided to them. It has the name $_. The perl idiom above, while(<FILEHANDLE>) { ... } assigns the value of each readline() into $_ for each iteration of the loop. print, without arguments, prints out the accumulator. Perl is very variadic, which might bother C folk. Still, the use of this idiom can result in less code on a line, reducing visual clutter. Other useful functions that can use the accumulator include: split, chomp, (regular expressions), map, grep, foreach

references

Perl doesn't have pointers (they don't work well with garbage collection). Instead, Perl has references, which are almost as powerful, and considerably less dangerous. Scalars hold references. To take a reference to something, prefix it with a backslash (\), and to dereference, prefix it with the type you expect to get back from it (it thus should have a double prefix). In some cases (such as passing through multiple references at once), it's necessary to be more verbose, and access it through the symbol table for the type you desire, using the syntax (symbol){reference}, nesting as needed. This last tip shouldn't be needed too often.

$a = "Foo";
$b = \$a;
print $$b . "\n";
$$b = "Bar";
print "Hey! Weird way to do it gives " . ${$b} . "\n";
print "Now A is $a\n";

This gives the output

Foo
Hey! Weird way to do it gives Bar
Now A is Bar

definedness

Variables can have the special value undef. This can be used to catch things that don't have a value yet or have had their value removed. The operators defined and undef are used to test and set this value. Notice that undefining a value in a hash does not remove it's key. Use delete to remove both. With the right options, Perl will complain when undef is attempted to be used in some ways.

truth

Perl's notion of truth is simple. All strings (apart from the empty string) are true. The number 0 is false, and all other numbers are true. Things not defined are false.

Subroutines, scopes, and other flow control

Perl has subroutines, and a number of other flow control mechanisms. The more useful/commonly used of them are described below.

foreach $scalar (list or array) {...} - This iterates over an array or list, aliasing each value in it to $scalar for that iteration. This can be used on hashes by using the keys or values function in the list slot. Notice that it's done with an alias, so it's possible to alter the list using this.
for($variable; CONDITION ; POSTLOOP) {...} - This is a traditional C loop. I'll assume you know how it works.
while(CONDITION) {...} - Also from C, works similarly.
if(CONDITION) {...} - Same as in C. Note that elseif is replaced with elsif.

The C keywords continue and break are replaced by next and last, respectively. They also can take an argument, which, if the loop is labeled, lets them specify which loop they're talking about. This lets you break out of nested loops without a lot of ugly logic.

Scopes

Without additional qualification, all variables in Perl are global. There are two types of scoping, done with the qualifiers my and local. my is equivilent to C's scoping (except with garbage collecting, so it's safe to return something made with my. local saves the old value of the variable away, and arranges for it to be restored when the current block exits. Generally, use of local should be discouraged unless it's being used to modify Perl's builtin globals for just a moment. There is another scoping mechanism that's almost exclusively used by the OO facilities that we won't discuss here. Note that the my and local keywords can be positioned flexibly, such as in the variable slot of foreach.

Subroutines

Perl starts execution at the top of a program (outside a subroutine) and progresses downwards. It's possible (and definitely recommended) to organize programs through subroutines. Subroutines recieve their arguments through the @_ array (a cousin to $_, which has it's own set of array-oriented functions that use it if nothing else is specified). Variadic functions are natural and easy in Perl. Unfortunately, if you want named, checked parameters, you need to manage that yourself. Recursion is safe in Perl (don't forget to use my). Here's a sub that takes two arguments, does subtraction, and returns the results

print my_subtract(10,3) . "\n";

sub my_subtract
{
my ($base, $decrement) = @_;
if(!( defined($base) && defined($decrement)))
	{die "my_subtract() passed bad arguments!\n";}
return($base - $decrement);
}

That script prints 7. Note that there's no restriction on where subroutines can be in your program -- all global subroutines declared this way are defined when a Perl script is compiled, before it is run. It's possible to take a reference to a sub or a code block.

$foo = \&my_subtract;

print &$foo(3,1) . "\n";

It is possible to return a number of parameters from a function. However, note that returning an array or a hash flattens it and returns its elements one-by-one. If you're only returning one array/hash, it's best to return it as the last parameter -- otherwise it's best to return a reference.

OO in Perl

Objects in Perl are handled through scalars, and implemented through a special namespace mechanism. Typically, an object is declared in a seperate file, which is loaded through the use keyword. For further examples in this section, below is an object declaration that we'll assume lies in AutonDemo.pm

#!/usr/bin/perl -w

package AutonDemo;

sub new
{
my $self = # This is taking a reference to an anonymous hash
        { # This is the syntax for initializing hashes.
        reads => 0,
        writes => 0,
        value => undef,
        karma => 2
        };
bless $self;
return $self;
}

sub getvalue
{
my $self = shift; # Shifts @_
$self->{reads}++;
if(rand(10) < 3)
        {
        print "You feel ill\n";
        $self->{karma}--;
        }
if($self->{karma} < 0)
        {$self->{value} = 0;}
return $self->{value};
}

sub setvalue
{
my($self,$value) = @_;
$self->{writes}++;
if(rand(10) < 1)
        {
        print "You feel safe\n";
        $self->{karma}++;
        }
$self->{value} = $value;
}

sub report
{
print "Value is " . $self->{value} . "\n";
print "Reads/Writes " . $self->{reads} . " " . $self->{writes} . "\n";
print "Karma is " . $self->{karma} . "\n";
}
1;

This class implements a logged variable. Here's a regression test for that class

#!/usr/bin/perl -w
use AutonDemo;

$foo = AutonDemo::new();

$foo->setvalue(4);
foreach $UNUSED (0 .. 10)
        {
        print $foo->getvalue . "\n";
        }
$foo->report();

Regexes in Perl

Regular expressions are an important feature in Perl. Unfortunately, they're also hard to read (in any language). There are many places you may have used some subset or relative of the regular expression language. DOS and shell wildcards are cousins to the language, offering a minute subset of what it is capable of. C offers POSIX regular expressions, unfortunately with a cumbersome API. Sed and Awk implement variants on POSIX regular expressions. Perl's regexes are the result of gradual expansion on POSIX standard regexes over several years, and have proved popular enough that Perl-style regexes have been backported to C (via the pcre package) and into some other languages (such as Python). Perl regexes are normally specified via enclosing forward slashes (/), and applied via the =~ operator. This syntax is used for matching. Perl Regexes are also capable of substitution, using the same operator but prepending a s to the slashes and using three of them. Parentheses inside Perl regexes are used to capture content (escape any parentheses that you're trying to match with a backslash). This content is stored in the variables $1, $2, and upwards, each corrisponding to the position of one of the sets of parentheses. If a regex appears alone on a line, it is applied to the accumulator.

$foo = "My name is Pat, I think";
$foo =~ /is\s([^,]+)/;
$name = $1;
print "I found the name $name\n";
$foo =~ s/$name/Andrew/;
print $foo . "\n";

Note that if a match or a substitution fails, it will return false. It's possible to add modifiers after the closing slash of a regex. Perl's regex style is documented in the perlre manpage. If you're not all that familiar with regexes, it might be helpful to find a book about them (O'Reilly makes a good one, alternatively, O'Reilly's Programming Perl has a section on them that's entirely Perl-centric)

Cool Perl tricks

Perl has a number of cool, quirky, and useful features

tie lets you bind a scalar to an object so writes and reads to that object transparently become method calls to that object. It works with hashes and arrays too.
dbmopen/dbmclose let you bind a hash to a DBM file, making it easy for your programs to have persistant storage
DBD is an abstraction over all the database drivers that Perl supports, letting you write (more) portable SQL without losing as many features as you might otherwise
DBD::CSV is a DBD database driver that allows you to talk SQL to CSV files (that is, no real database).
DBD::Excel is another DBD driver (still under development) that lets you talk SQL to Excel files
DBD::RAM lets you talk SQL to an in-memory database
Coy is a module that translates fatal errors you make with die() (or that Perl might generate itself) into haiku
Math::BigInt is a module that lets you do arbitrary-precision math
Tk is a module that lets Perl talk to the Tk graphics toolkit
eval() interprets a string as Perl code, compiling and running it in-place
taint mode tracks what variables are acquired from the user versus the script and trusted files, making it safer to write CGIs and setuid scripts

My recommended style

My style is, to a certain degree, the style accepted by the Perl community at large.

Start your program with definitions of variables that the user might want to change. Try to keep these definitions in a format that end users are unlikely to accidentally cause syntax errors by editing
Below those definitions, call a subroutine main(), which holds the core logic of your program.
Keep main() small -- use subroutines to do most of the real work. main() should be an overview of what your program does.
reduce visual clutter by using $_
format code to reduce visual clutter, breaking statements over several lines
use warnings (in the #! line at the top of your program, add the -w flag to perl)
For larger programs, also put use strict; at the top of your program
If you have a chain of functions calling other functions, put each on a seperate line instead of doing a long multiline function. Comment these well.
For large printed sections of code, use a HERE document instead of lots of quotes. HERE documents replace a single scalar in a statement with a <<IDENTIFIER (the statement finishes normally). After that line finishes, all future text until IDENTIFIER appears alone at the beginning of a line is seen as the content of that scalar.

Example of a HERE document

$foo = <<EOHERE;
Meow
This is multiline
I think

EOHERE
print $foo;

Resources

You may find the following helpful

http://www.cpan.org => CPAN - CPAN is the community repository for Perl modules.
http://www.perl.com => Perl.com - Has articles and several other resources
http://www.perlmonks.org => Perlmonks is a good place to ask questions
http://www.perldoc.com => Perldoc.com is another documentation source
man perl is an index into the perl manpages that are useful to learn concepts
perldoc -f function is a way to learn about particular perl functions
O'Reilly's Programming Perl is a good book

Weaknesses of Perl

Perl has a few weaknesses

It's irritating that parameter passing for subroutines must be done manually
It's possible to code illegibly in Perl
Dealing with standard filehandles is awkward
It can be confusing that @a and $a have nothing to do with each other, especially given that $a[0] belongs to @a and not to $a
With such a fluid language, error checking is less powerful than inmore static languages
There are a number of internal variables (see man perlvar) that can change Perl's behavior in surprising ways when altered
Perl's defaults (without -w and use strict) are very lax in the code it accepts
The syntax for accessing a value through several references of different kinds can be very hairy (more so than going through chains of structs in C)
Exception handling is fairly weak in Perl