Sunday, September 30, 2012

10 things every Perl hacker should know

Perl is the expert system administrator's scripting language of choice, but it is also a lot more than that. As a language designed for file and text processing, it is ideally suited to UNIX system administration, Web programming, and database programming, among dozens of other uses.

As one of the easiest programming languages to use for whipping up quick, effective code for simple tasks, Perl attracts new users easily and has become an important and popular tool for getting things done. Before jumping fully into Perl programming, though, there are a few things you should know that will make your life easier, both when writing code and when asking for help from Perl experts.


1.       Perl is not an acronym

Perl is sometimes known as the Practical Extraction and Report Language, because it's very practical, and it is very good at extracting data and creating reports using that data. It is also known humorously as the Pathologically Eclectic Rubbish Lister, for reasons that might become obvious after you've used it for a while. Both of these phrases are equally "official" and equally correct, but the language is not PERL. It was named Perl before either of those phrases was invented, and the language is in fact not technically an acronym at all. When speaking of the language, call it Perl, and when speaking of the parser (the interpreter/compiler), it is acceptable to call it perl, because that is how the command used to run it is spelled. One of the quickest ways to get identified as a know-nothing newbie when talking to Perl hackers is to call it PERL.

2.       There is more than one way to do it

One of the main mottos of the Perl language and community is TIMTOWTDI, pronounced "Tim Toady". This one really is an acronym, and it stands for There Is More Than One Way To Do It. It's true of Perl on many levels, and is something important to keep in mind. While some ways to do something are often better than other ways to do the same thing, you can be sure that for pretty much everything you can do with Perl, there is more than one way to do it.

3.       Use warnings and use strict

Warnings and the strict pragma are important weapons in the Perl hacker's arsenal for debugging code. Warnings will not prevent a program from executing, but will give helpful information on how the code can be fixed up.

The strict pragma will actually prevent the code from executing if a strict approach to programming style is not used, such as lexically scoping variables. Once in a while, a program might be better off without the strict pragma, but if you're new to Perl it will surely be a long time before you learn to recognize such situations, and until then you should just use it.

A Perl script with warnings turned on in the shebang line and the strict pragma used, on a standard UNIX system, would start like this:
#!/usr/bin/Perl -w
use strict;

Warnings can also be turned on with a use statement, like this:
#!/usr/bin/Perl
use strict;
use warnings;

A pragma, in Perl, is a preprocessor directive. In other words, it's an instruction sent to the compiler before the code is compiled for execution. Pragmas change how the compiler parses code.

4.       Use taint checking

With the -T option on the shebang line of your program, you explicitly turn on taint checking. This is a security measure that checks all input for your program for "tainted" data, to help ensure that incoming data will not allow arbitrary code execution if a malicious user is trying to crack security on the system running your code. This is especially important in circumstances where you are using Perl/CGI scripts to process data from an HTML form on the Web. It can be combined with the -w option as -wT.

5.       Use lexically scoped variables

You can use the my() operator to create variables using lexical scoping. In brief, this means that the scope of the variable is limited to the current context: if you declare a variable using my() inside a subroutine, the variable only exists inside that subroutine. The value of lexical scoping is that it protects different parts of modular code from one another.

For instance, if you're using a Perl module or library without knowing exactly what the code inside it looks like, using lexical variables can help to prevent accidentally assigning new values to variables that need to remain unchanged until later. It is especially important to use lexical scoping for your variables when writing modules and libraries in Perl. For those coming to Perl from other languages, you may know of the concept of lexically scoped variables as "private variables".

6.       How to name your programs

Perl programs should have the appropriate file extension in their names. Many lower-quality Perl howtos simply use the .pl extension for everything, naming Perl scripts something like foo.pl. Technically, the .pl extension should be used for Perl libraries, not for executable Perl programs. For executable files, you should either use .plx or, if your operating system will allow it, no file extension at all. Perl modules, meanwhile, should use the .pm file extension. It is also considered good practice to use only alphanumeric characters and underscores in Perl script filenames, and to start those filenames with a letter (or underscore), similar to how you would start variable names.

7.       How to use CPAN

The Comprehensive Perl Archive Network (CPAN) is a rich resource for finding freely available, reusable code. In particular, CPAN is where you'll find legions of Perl modules that can be used to enhance the functionality of your programs and reduce the time you spend writing them. The options you have for using CPAN vary from one operating system and Perl parser implementation to the next, but you can always browse CPAN using your Web browser. Perl implementations generally come with at least a command-line tool for installing Perl modules from CPAN.

8.       How to use Perldoc

The online documentation for Perl is extensive and comprehensive, in the form of Perldoc. With Perldoc installed on your system, you can use it to access documentation on any of the standard Perl functions, installed modules, variables, and a slew of other things -- even Perldoc itself! It's like having one of the most complete programming reference books available right at your fingertips, for free, and searchable since it's in electronic format.

On some systems, Perldoc will be installed by default with Perl itself, and on others the process for installation should be self-evident. If you have problems getting Perldoc installed, you can always access the online Perldoc Website. Make sure you know how to use Perldoc, because it can make you a more effective Perl hacker in ways that just might surprise you.

9.       Don't reinvent the wheel

You should use subroutines, modules, and libraries often. The point is to help you write code faster and keep that code from becoming unmanageable if you need the same functionality in multiple programs, or more than once in the same program. This is accomplished by separating blocks of code from the rest of your source code using subroutines, modules, and libraries. You're better off using a design for the wheel that already exists, rather than reinventing the wheel from scratch, most of the time. In addition, when you're reusing code from a subroutine, module, or library, and you need to improve that code somehow, you only need to change it in one place.

The term "subroutine" in Perl means roughly the same thing as "function" in C.

10.     Regular expressions are your friends

Perl's regular expression syntax can help to make your source code look intimidating to the uninitiated, and as a result sometimes people new to Perl programming avoid regexen. This is, really, a mistake. Regular expressions add a great deal of power to the Perl programming language, often allowing the programmer to do something in three lines that might otherwise take fifty lines of code. Regular expressions are expressions made up of abbreviations for matching patterns in strings that can be used to find and manipulate smaller strings inside larger strings. It behooves the Perl hacker to learn regex syntax and learn it well.

Often Perl hackers and other programmers who use regular expressions will refer to them as "regexen" or "regexes" in the plural ("regex" singular). An alternate version of "regex" is "regexp", though why anyone would want to add that extra letter, making it more difficult to pronounce clearly, is beyond me.

Once you've internalized the lessons of this list, you're ready to really start learning Perl, and how to program with it. Some good resources for beginners (and experts, too) include Learning Perl for an excellent introductory text, the PerlMonks community for tutorials and discussion, and Ovid's CGI Course for Perl/CGI Web programming.

No comments:

Post a Comment