A brief Perl primer
Overview
- Roles of Perl
- Styles of Perl Programming
- Perl data types and operations on them
- Subroutines, scopes, and other flow control
- OO in Perl
- Regexes
- Cool Perl tricks
- My recommended style
- Resources
- Program overview
- Weaknesses of Perl
Roles of Perl
Perl is useful for a number of things. Thanks to a strong developer community,
there is a central repository called CPAN
through which software modules for tasks from fuzzy logic to talking to Oracle
are distributed. A rich (if cryptic) pattern matching/transformation/extraction
sublanguage for regular expressions (regexes) provides powerful text
manipulation capabilities. Perl datatypes are free-form, automatically sized
to meet needs. Perl's syntax is very loose, allowing easy migration from
many different languages. Integration with other languages is fairly easy,
using low-level tools such as libperl or XS, or using modules such as
Inline::Java . Perl is portable -- Pure Perl code is generally portable
across Unices, Windows, MacOS, and other platforms. Memory management in Perl
is very easy -- the Perl garbage collector handles most tasks for you, only
needing help to break circular references. Perl is also semi-interpreted,
giving it better performance than purely interpreted languages.
Styles of Perl Programming
#!/usr/bin/perl
print "Hello, world!\n";
Perl started life inspired by C, shell, various Unix utilities such as
sed and awk, and some other languages, and has continued to borrow
useful/beautiful constructs from others throughout its evolution. It
has been said that Perl is a language in which people can write their
own language, and this is equally valid as criticism and praise. It is
very possible to use Perl primarily as a better shell script language
than shell -- shell programming is subject to the whims of various Unix
vendors, not just in the details of the shell, but also in how sed, awk,
and the like work. Portable shell programming is very difficult. Many
programmers have reported being happy moving from languages such as LISP
to Perl. I don't know LISP well enough to comment on this :) Due to the
popularity of C, C++, and Java, many people's Perl resemble those languages
structures and flow. Perl is, in my eyes, a better C than C, and similarly
for those other languages, and you can see a lot of C in my Perl. There are,
however, advantages to picking up Perl idioms over time. Unlike a lot
of these other languages, you don't need to know much Perl to do something
useful with it. Perl is also capable of Object-Oriented Programming,
with most CPAN modules offering only object interfaces.
Perl Data Types
Perl has 5 essential data types.
Data types
- Scalar ($) -- Scalars hold single values. They hold numbers, strings, objects, references, and any other single elements you might want. Perl isn't picky about what you put in a scalar, and will determine by context and content what you want when you use a scalar. Scalars are marked by the '$' prefix.
$a = "Cat";
$a = $a . "'s Whiskers";
print $a . "\n";
returns Cat's Whiskers. If we then do:
$a += 2;
print "$a\n";
we get 2. We then might do
$a .= " is it\n";
print $a;
We'd get 2 is it. When Perl is asked to interpret a scalar as
a number that was last used as a string, it looks for something that
looks like a number at the beginning, numerifies it, and uses it.
When asked to use a number as a string, it simply stringifies it.
Perl strings are sized automatically, can be assigned into, and
can be manipulated in powerful ways using substr and regular
expressions.
- Arrays (@) -- Arrays hold ordered, indexed lists of scalars.
To refer to a list as a whole, the '@' prefix is used. To refer to
a scalar in a list, '$' is used with the index. It is also possible
to refer to a slice of a list, assigning into or reading out of those
parts, with a continuous or explicit range. A list is an immediate
form of an array, specified as values in parentheses.
@cats = ("Wally", "Nemesis", "Sammy", "Oliver");
foreach $cat (@cats)
{
print "We have a cat called $cat\n";
}
is the same thing as
@cats = ("Wally", "Nemesis", "Sammy", "Oliver");
for($i=0; $i<= $#cats; $i++)
{
print("We have a cat called " . $cats[$i] . "\n");
}
Assigning lists is handled intelligently:
($foo,undef,$bar) = @ARGV[0..$#ARGV];
print $foo . "\n";
This prints out the first argument passed to the script.
Assigning to undef in a list is useful to discard parts of
the source list. Note that it is not a problem if @ARGV is
bigger than the target list -- extra elements are discarded.
Notice also that if @ARGV is shorter, the assignment will still
happen, but the unmatched elements will get the value undef
(more on that later). Finally, note that the @ARGV array (familiar
to C programmers) is an array in Perl (@ARGV's element 0 is
C's argv[1].)
The last index of an array @foo is available as $#foo,
whereas the size of the same array is available as scalar(@foo)
or otherwise attempting to retrieve @foo in scalar context. It is
possible to push, pop, shift, and unshift an array, each an operation
that places and retrieves new values at the start and end of an
array, respectively. A destructive way to do the above examples
would be
@cats = ("Wally", "Nemesis", "Sammy", "Oliver");
while(my $cat = pop(@cats) )
{
printf("We have a cat called %s\n", $cat);
}
- Hash/Associative Array (%) --
Hashes are another kind of list that, instead of using numeric
indices, uses scalar indices (called keys). They can be used
for a number of purposes, including code-organizational purposes.
Like arrays, they use their own symbol when referring to the entire
hash, but individual members are accessed using the scalar symbol $ with the key.
$a = "Haha\n";
my %prefs;
$prefs{directory} = $ENV{HOME} . "./myprogram";
$prefs{username} = $ENV{USER};
$prefs{foo} = $a;
Notice that the environment variables are available as the hash %ENV
in Perl. It is also possible to retrieve the keys of a hash using the
keys function, which returns them in list context. It is
possible to delete keys and their associated values using the delete function.
- Filehandles (no symbol) --
Filehandles in Perl have no symbol, and are global (unfortunately).
There are modules that get around this, not described here.
Filehandles can be opened with open, closed with close,
and read with readline. There is a special syntax that can
make readline implicit within a certain type of while loop.
open(RESOLV, "/etc/resolv.conf");
while($line = <RESOLV>)
{
print $line;
}
close(RESOLV);
This code prints out the contents of /etc/resolv.conf (not
very useful unless you're on a Unix system). Notice that this is
equiv to the following examples:
$myfile = "/etc/resolv.conf";
open(RESOLV, $myfile) || die "Could not open $myfile: $!\n";
while($line = readline(RESOLV))
{
chomp($line); # Remove newline, if present
print $line . "\n";
}
and this too (idiomatic Perl, I'll explain in a second what this is doing)
open(RESOLV, "/etc/resolv.conf") || die "Could not open /etc/resolv.conf: $!\n";
while(<RESOLV>)
{
print;
}
In the last 2 examples, the global variable $! is accessed if
the open failed. This variable contains a string indicating why the
last function failed, and might contain a string such as
"permission denied" or "no such directory". The last example uses
a special perl scalar called the accumulator, which will be discussed
momentarily. Notice that the angle brackets around the filehandle
acts to read a line from it, and when that fails, the while loop
terminates. To write to a file, open it with a > marker
at the start of its name. You may then direct print statements to
it by listing the filehandle before the print content (no comma):
open(MYFILE, ">" . $ENV{HOME} . "/myfile.txt") || die "Can't do example: $!\n";
print MYFILE "Hello, world!\n";
close(MYFILE);
- Code (&) -- The equivilent of C's function pointers
is possible, as are anonymous functions, local functions,
piecemail construction of functions through regexes and eval,
and various other black magic. If you decide you need to use any
of this, I suggest you borrow or purchase O'Reilly's Programming
Perl or Perl Cookbook. Alternatively, you can come find
me, or dig out one of the tutorials on the web, or perhaps you'd care
to read the Perl manpages/docs.
The accumulator
Perl has a special scalar that's accessed in many functions if no
argument is provided to them. It has the name $_. The perl
idiom above, while(<FILEHANDLE>) { ... } assigns
the value of each readline() into $_ for each iteration
of the loop. print, without arguments, prints out the
accumulator. Perl is very variadic, which might bother C folk. Still,
the use of this idiom can result in less code on a line, reducing
visual clutter. Other useful functions that can use the accumulator
include: split, chomp, (regular expressions), map, grep, foreach
references
Perl doesn't have pointers (they don't work well with garbage collection).
Instead, Perl has references, which are almost as powerful, and
considerably less dangerous. Scalars hold references. To take a
reference to something, prefix it with a backslash (\), and to
dereference, prefix it with the type you expect to get back from
it (it thus should have a double prefix). In some cases (such as
passing through multiple references at once), it's
necessary to be more verbose, and access it through the symbol
table for the type you desire, using the syntax (symbol){reference},
nesting as needed. This last tip shouldn't be needed too often.
$a = "Foo";
$b = \$a;
print $$b . "\n";
$$b = "Bar";
print "Hey! Weird way to do it gives " . ${$b} . "\n";
print "Now A is $a\n";
This gives the output
Foo
Hey! Weird way to do it gives Bar
Now A is Bar
definedness
Variables can have the special value undef. This can be
used to catch things that don't have a value yet or have had their
value removed. The operators defined and undef are
used to test and set this value. Notice that undefining a value in
a hash does not remove it's key. Use delete to remove both.
With the right options, Perl will complain when undef is attempted
to be used in some ways.
truth
Perl's notion of truth is simple. All strings (apart from the empty
string) are true. The number 0 is false, and all other numbers are
true. Things not defined are false.
Subroutines, scopes, and other flow control
Perl has subroutines, and a number of other flow control mechanisms.
The more useful/commonly used of them are described below.
- foreach $scalar (list or array) {...} - This iterates over an array or list, aliasing each value in it to $scalar for that iteration. This can be used on hashes by using the keys or values function in the list slot. Notice that it's done with an alias, so it's possible to alter the list using this.
- for($variable; CONDITION ; POSTLOOP) {...} - This is a traditional C loop. I'll assume you know how it works.
- while(CONDITION) {...} - Also from C, works similarly.
- if(CONDITION) {...} - Same as in C. Note that elseif is replaced with elsif.
The C keywords continue and break are replaced by
next and last, respectively. They also can take an
argument, which, if the loop is labeled, lets them specify which
loop they're talking about. This lets you break out of nested loops
without a lot of ugly logic.
Scopes
Without additional qualification, all variables in Perl are global.
There are two types of scoping, done with the qualifiers my
and local. my is equivilent to C's scoping (except
with garbage collecting, so it's safe to return something made with
my. local saves the old value of the variable away,
and arranges for it to be restored when the current block exits.
Generally, use of local should be discouraged unless it's being
used to modify Perl's builtin globals for just a moment. There is
another scoping mechanism that's almost exclusively used by the
OO facilities that we won't discuss here. Note that the my and
local keywords can be positioned flexibly, such as in
the variable slot of foreach.
Subroutines
Perl starts execution at the top of a program (outside a
subroutine) and progresses downwards. It's possible (and
definitely recommended) to organize programs through
subroutines. Subroutines recieve their arguments through
the @_ array (a cousin to $_, which has it's
own set of array-oriented functions that use it if nothing
else is specified). Variadic functions are natural and easy
in Perl. Unfortunately, if you want named, checked parameters,
you need to manage that yourself. Recursion is safe in Perl
(don't forget to use my). Here's a sub that takes two
arguments, does subtraction, and returns the results
print my_subtract(10,3) . "\n";
sub my_subtract
{
my ($base, $decrement) = @_;
if(!( defined($base) && defined($decrement)))
{die "my_subtract() passed bad arguments!\n";}
return($base - $decrement);
}
That script prints 7. Note that there's no restriction on where
subroutines can be in your program -- all global subroutines declared
this way are defined when a Perl script is compiled, before it is run.
It's possible to take a reference to a sub or a code block.
$foo = \&my_subtract;
print &$foo(3,1) . "\n";
It is possible to return a number of parameters from a function.
However, note that returning an array or a hash flattens it and
returns its elements one-by-one. If you're only returning one
array/hash, it's best to return it as the last parameter -- otherwise
it's best to return a reference.
OO in Perl
Objects in Perl are handled through scalars, and implemented through
a special namespace mechanism. Typically, an object is declared
in a seperate file, which is loaded through the use keyword.
For further examples in this section, below is an object
declaration that we'll assume lies in AutonDemo.pm
#!/usr/bin/perl -w
package AutonDemo;
sub new
{
my $self = # This is taking a reference to an anonymous hash
{ # This is the syntax for initializing hashes.
reads => 0,
writes => 0,
value => undef,
karma => 2
};
bless $self;
return $self;
}
sub getvalue
{
my $self = shift; # Shifts @_
$self->{reads}++;
if(rand(10) < 3)
{
print "You feel ill\n";
$self->{karma}--;
}
if($self->{karma} < 0)
{$self->{value} = 0;}
return $self->{value};
}
sub setvalue
{
my($self,$value) = @_;
$self->{writes}++;
if(rand(10) < 1)
{
print "You feel safe\n";
$self->{karma}++;
}
$self->{value} = $value;
}
sub report
{
print "Value is " . $self->{value} . "\n";
print "Reads/Writes " . $self->{reads} . " " . $self->{writes} . "\n";
print "Karma is " . $self->{karma} . "\n";
}
1;
This class implements a logged variable. Here's a regression test for that
class
#!/usr/bin/perl -w
use AutonDemo;
$foo = AutonDemo::new();
$foo->setvalue(4);
foreach $UNUSED (0 .. 10)
{
print $foo->getvalue . "\n";
}
$foo->report();
Regexes in Perl
Regular expressions are an important feature in Perl. Unfortunately,
they're also hard to read (in any language). There are many places
you may have used some subset or relative of the regular expression
language. DOS and shell wildcards are cousins to the language, offering
a minute subset of what it is capable of. C offers POSIX regular
expressions, unfortunately with a cumbersome API. Sed and Awk implement
variants on POSIX regular expressions. Perl's regexes are the result
of gradual expansion on POSIX standard regexes over several years, and
have proved popular enough that Perl-style regexes have been backported
to C (via the pcre package) and into some other languages (such as Python).
Perl regexes are normally specified via enclosing forward slashes
(/), and applied via the =~ operator. This syntax is
used for matching. Perl Regexes are also capable of substitution, using
the same operator but prepending a s to the slashes and using
three of them. Parentheses inside Perl regexes are used to capture
content (escape any parentheses that you're trying to match with a
backslash). This content is stored in the variables $1, $2, and upwards,
each corrisponding to the position of one of the sets of parentheses.
If a regex appears alone on a line, it is applied to the accumulator.
$foo = "My name is Pat, I think";
$foo =~ /is\s([^,]+)/;
$name = $1;
print "I found the name $name\n";
$foo =~ s/$name/Andrew/;
print $foo . "\n";
Note that if a match or a substitution fails, it will return
false. It's possible to add modifiers after the closing slash
of a regex. Perl's regex style is documented in the perlre
manpage. If you're not all that familiar with regexes, it might
be helpful to find a book about them (O'Reilly makes a good one,
alternatively, O'Reilly's Programming Perl has a section
on them that's entirely Perl-centric)
Cool Perl tricks
Perl has a number of cool, quirky, and useful features
- tie lets you bind a scalar to an object so writes and reads to that object transparently become method calls to that object. It works with hashes and arrays too.
- dbmopen/dbmclose let you bind a hash to a DBM file, making it easy for your programs to have persistant storage
- DBD is an abstraction over all the database drivers that Perl supports, letting you write (more) portable SQL without losing as many features as you might otherwise
- DBD::CSV is a DBD database driver that allows you to talk SQL to CSV files (that is, no real database).
- DBD::Excel is another DBD driver (still under development) that lets you talk SQL to Excel files
- DBD::RAM lets you talk SQL to an in-memory database
- Coy is a module that translates fatal errors you make with die() (or that Perl might generate itself) into haiku
- Math::BigInt is a module that lets you do arbitrary-precision math
- Tk is a module that lets Perl talk to the Tk graphics toolkit
- eval() interprets a string as Perl code, compiling and running it in-place
- taint mode tracks what variables are acquired from the user versus the script and trusted files, making it safer to write CGIs and setuid scripts
My recommended style
My style is, to a certain degree, the style accepted by the Perl
community at large.
- Start your program with definitions of variables that the user might want to change. Try to keep these definitions in a format that end users are unlikely to accidentally cause syntax errors by editing
- Below those definitions, call a subroutine main(), which holds the core logic of your program.
- Keep main() small -- use subroutines to do most of the real work. main() should be an overview of what your program does.
- reduce visual clutter by using $_
- format code to reduce visual clutter, breaking statements over several lines
- use warnings (in the #! line at the top of your program, add the -w flag to perl)
- For larger programs, also put use strict; at the top of your program
- If you have a chain of functions calling other functions, put each on a seperate line instead of doing a long multiline function. Comment these well.
- For large printed sections of code, use a HERE document instead of lots of quotes. HERE documents replace a single scalar in a statement with a <<IDENTIFIER (the statement finishes normally). After that line finishes, all future text until IDENTIFIER appears alone at the beginning of a line is seen as the content of that scalar.
Example of a HERE document
$foo = <<EOHERE;
Meow
This is multiline
I think
EOHERE
print $foo;
Resources
You may find the following helpful
Weaknesses of Perl
Perl has a few weaknesses
- It's irritating that parameter passing for subroutines must be done manually
- It's possible to code illegibly in Perl
- Dealing with standard filehandles is awkward
- It can be confusing that @a and $a have nothing to do with each other, especially given that $a[0] belongs to @a and not to $a
- With such a fluid language, error checking is less powerful than inmore static languages
- There are a number of internal variables (see man perlvar) that can change Perl's behavior in surprising ways when altered
- Perl's defaults (without -w and use strict) are very lax in the code it accepts
- The syntax for accessing a value through several references of different kinds can be very hairy (more so than going through chains of structs in C)
- Exception handling is fairly weak in Perl