P5EEx-Blue-0.01

P5EEx::Blue::perlstyle


NAME

P5EEx::Blue::perlstyle - P5EE Style Guide


INTRODUCTION

All code and documentation that is submitted to be included in the P5EE distribution should follow the style in this document. This is not to try to stifle your creativity, but to make life easier for everybody who has to work with your code, and to aid those who are not quite sure how to do something. It also serves to resolve disputes so that no one takes it personally.

These conventions below apply to perl modules, web (CGI/mod_perl) programs, command-line programs, specifically, but also might apply to some degree to any Perl code written for use in P5EE.

Note that these are all guidelines, not unbreakable rules. If you have a really good need to break one of the rules herein, however, then it is best to ask the core P5EE team first.

Note that with much of this document, it is not so much the Right Way as it is Our Way. We need to have conventions in order to make life easier for everyone.

If you have any questions, please ask us on the P5EE-development mailing list, p5ee@perl.org.

    http://lists.perl.org/showlist.cgi?name=p5ee

Documentation for the P5EE project is at the following sites.

    http://p5ee.perl.org
    ../../../../../p5ee

This document can and will be modified over time. We hope to add any significant changes at the bottom of the document.


CODING PRINCIPLES

Perl Version

We code everything to perl 5.005_03. Some day we may switch to take advantage of perl 5.6 features. Regardless, all code should run on perl 5.005_03 or any later version of perl 5. All of the core P5EE code has been tested on perl 5.005_03 and perl 5.6.0, though it has probably been used more on perl 5.6.0.

Documentation

All modules will be documented using the POD examples in the module boilerplate. The function, purpose, use of the module will be explained, and each public API will be documented with name, description, inputs, outputs, side effects, etc.

If an array or hash reference is returned, document the size of the array (including what each element is, as appropriate) and name each key in the hash. For complex data structures, map out the structure as appropriate.

Also document what kind of data returned values are. Is it an integer, a block of HTML, a boolean?

All command-line program options will be documented using the boilerplate code for command-line programs. Each available function, switch, etc. should be documented, along with a statement of function, purpose, use of the program. Try not to use the same options as another program, for a different purpose.

All web programs should be documented with a statement of function, purpose, and use in the comments of the program.

Any external documents, and documentation for command-line programs and modules, should be written in POD, where appropriate. From there, they can be translated to many formats with the various pod2* translators. Read the perlpod manpage before writing any POD, because although POD is not difficult, it is not what most people are used to. It is not a regular markup language; it is just a way to make easy documentation for translating to other formats. Read, and understand, the perlpod manpage, and ask us or someone else who knows if you have any questions.

Version

Use the boilerplate code for versions of modules, web programs, and command-line programs. The $VERSION of the module will then reflect the CVS revision. The Makefile.PL should contain the distribution version, independent of any individual file version within the CVS repository.

Also, XS modules should probably have $VERSION also reflect the distribution, or else you'll need to recompile the shared library every time you make a change to the file, which is really a pain to do during development.

Our distribution versions use tuples, where the first number is the major revision, the second number is the version, and third number is the subversion. Odd-numbered versions are development versions. Examples:

    1.0.0       First release of P5EE 1
    1.0.1       Second release of P5EE 1.0
    1.0.10      etc.
    1.1.0       First development release of P5EE 1.2 (or 2.0)
    2.0.0       First release of P5EE 2

Versions can be modified with a hyphen followed by some text, for special versions, or to give extra information. Examples:

    1.1.4-bender    Notes that this is a bender release
    2.0.0-pre1      Notes that this is not final, but preview

In perl 5.6.0, you can have versions like v2.0.0, but this is not allowed in previous versions of perl. So to convert a tuple version string to a string to use with $VERSION, use a regular integer for the revision, and three digits for version and subversion. Examples:

    1.1.6   ->      1.001006
    2.0.0   ->      2.000000

This way, perl can use the version strings in greater-than and less-than comparisons.

Comments

All code should be self-documenting as much as possible. Only include necessary comments. Use names like ``$story_count'', so you don't need to do something like:

    # story count
    my $sc = 0;

Include any comments that are, or might be, necessary in order for someone else to understand the code. Sometimes a simple one-line comment is good to explain what the purpose of the following code is for. Sometimes each line needs to be commented because of a complex algorithm. A good reference is Kernighan & Pike's Practice of Programming about commenting.

Warnings and Strict

All code must compile and run cleanly with ``use strict'' enabled and the perl ``-w'' (warnings) option on. If you must do something that -w or strict complains about, there are workarounds, but the chances that you really need to do it that way are remote.

The one exception is the ``Use of uninitialized variable'' warnings. We have those disabled in P5EE.pm, so by including ``use P5EE'' you are disabling that warning in your code, too, and you don't need to worry about them.

Lexical Variables

Use only lexical variables, except for special global variables ($VERSION, %ENV, @ISA, $!, etc.) or very special circumstances. Global variables for regular use are never appropriate. When necessary, ``declare'' globals with ``use vars'', not with our() (our() was introduced in perl 5.6).

A lexical variable is created with my(). A global variable is pre-existing (if it is a special variable), or it pops into existence when it is used. local() is used to tell perl to assign a temporary value to a variable. This should only be used with special variables, like $/, or in special circumstances. If you must assign to any global variable, consider whether or not you should use local().

local() may also be used on elements of arrays and hashes, though there is seldom a need to do it, and you shouldn't.

Exporting

Do not export anything from a module by default. Feel free to put anything you want to in @EXPORT_OK, so users of your modules can explicitly ask for symbols (e.g., ``use P5EE::Something qw(getFoo setFoo)''), but do not export them by default.

Pass by Reference

Arrays and hashes should be passed to and from functions by reference only. Note that a list and an array are NOT the same thing. This is perfectly fine:

    return($user, $form, $constants);

An exception might be a temporary array of discrete arguments:

    my @return = ($user, $form);
    push @return, $constants if $flag;
    return @return;

Although, usually, this is better (faster, easier to read, etc.):

    if ($flag) {
        return($user, $form, $constants);
    } else {
        return($user, $form);
    }

Garbage Collection

Perl does pretty good garbage collection for you. It will automatically clean up lexical variables that have gone out of scope and objects whose references have gone away. Normally you don't need to worry about cleaning up after yourself, if using lexicals.

However, some glue code, code compiled in C and linked to Perl, might not automatically clean up for you. In such cases, clean up for yourself. If there is a method in that glue to dispose or destruct, then use it as appropriate.

Also, if you have a long-running function that has a large data structure in it, it is polite to free up the memory as soon as you are done with it, if possible.

    my $huge_data_structure = get_huge_data_structure();
    do_something_with($huge_data_structure);
    undef $huge_data_structure;

__END__ and __DATA__ and __PACKAGE__

Do not use __END__ or __DATA__ in web programs. They break mod_perl. Also, __PACKAGE__ will likely not return the value you expect in web programs. These are all fine for modules.

Tests

Modules should provide test code, with documentation on how to use it.

STDIN/STDOUT

Always report errors using the (yet-to-be-defined) P5EE logging facility. Never print directly to STDERR. Do not print directly to STDOUT, unless you need to print directly to the user's browser.

In command-line programs, feel free to print to STDERR and STDOUT as needed.

Files and Globs

For constructing and parsing file paths, use File::Spec::Functions and File::Basename. For creating or removing paths, use File::Path. This increases portability to non-Un*x platforms.

    my $path = "$dir/$file";                # wrong
    my $path = catfile($dir, $file);        # right
    my $dir = ".";                          # wrong
    my $dir = curdir();                     # right
    mkdir("/path"), mkdir("/path/to"), ...  # wrong
    `mkdir /path`; `mkdir /path/to`, ...    # very wrong
    mkpath("/path/to/my/dir", 0, 0775);     # right

Do not use the glob operator (glob('*') or <*>). Use opendir() with readdir() instead. Note that glob() is much more portable in perl 5.6 than it was in previous versions of perl, but its behavior is still unreliable, as each perl installation can choose to implement perl using local conventions instead of the default, which is via the File::Glob module.

Do not use symbol table globs (not the same kind of glob as above!) like *foo for anything, except for when direct symbol table manipulation is necessary, which it almost never is.

System Calls

Always check return values from system calls, including open(), close(), mkdir(), or anything else that talks directly to the system. Perl built-in system calls return the error in $!; some functions in modules might return an error in $@ or some other way, so read the module's documentation if you don't know. Always do something, even if it is just calling errorLog(), when the return value is not what you'd expect.


STYLE

Much of the style section is taken from the perlstyle manpage. We make some changes to it here, but it wouldn't be a bad idea to read that document, too.

Terminology

P5EE
The name of the project is ``P5EE''. There is no ``P5EE1'' or ``P5EE2''. To specify a version, use ``P5EE 2.0'' or ``P5EE 2.0.1''.

function vs. sub(routine) vs. method
``Method'' should be used only to refer to a subroutine that are object methods or class methods; that is, these are functions that are used with OOP that always take either an object or a class as the first argument. Regular subroutines, ones that are not object or class methods, are functions. Class methods that create and return an object are optionally called constructors.

Names

Don't use single-character variables, except as iterator variables.

Don't use two-character variables just to spite us over the above rule.

Constants are in all caps; these are variables whose value will never change during the course of the program.

    $Minimum = 10;          # wrong
    $MAXIMUM = 50;          # right

Other variables are lowercase, with underscores separating the words. They words used should, in general, form a noun (usually singular), unless the variable is a flag used to denote some action that should be taken, in which case they should be verbs (or gerunds, as appropriate) describing that action.

    $thisVar      = 'foo';  # wrong
    $this_var     = 'foo';  # right
    $work_hard    = 1;      # right, verb, boolean flag
    $running_fast = 0;      # right, gerund, boolean flag

Arrays and hashes should be plural nouns, whether as regular arrays and hashes or array and hash references. Do not name references with ``ref'' or the data type in the name.

    @stories     = (1, 2, 3);      # right
    $comment_ref = [4, 5, 6];      # wrong
    $comments    = [4, 5, 6];      # right
    $comment     = $comments->[0]; # right

Make the name descriptive. Don't use variables like ``$sc'' when you could call it ``$story_count''. See Comments.

Methods and Functions (except for special cases, like AUTOLOAD) begin with a verb, with words following to complete the action. Multi-word names should be all lower-case, separated by underscores, in keeping with the ``perlstyle'' guide and most of the modules already on CPAN. They should as clearly as possible describe the activity to be peformed, and the data to be returned.

    $obj->getStory();             # wrong.
    $obj->setStoryByName();       # wrong again.
    $obj->getStoryByID();         # wrong again. This isn't Java!
    $obj->get_story();            # right.
    $obj->set_story_by_name();    # right.
    $obj->get_story_by_id();      # right.

Methods and Functions beginning with _ are special: they are not to be used outside the current file (i.e. ``private''). This is not enforced by the code itself, but by programmer convention only.

For large for() loops, do not use $_, but name the variable. Do not use $_ (or assume it) except for when it is absolutely clear what is going on, or when it is required (such as with map() and grep()).

    for (@list) {
        print;              # OK; everyone knows this one
        print uc;           # wrong; few people know this
        print uc $_;        # better
    }

Note that the special variable _ should be used when possible. It is a placeholder that can be passed to stat() and the file test operators, that saves perl a trip to re-stat the file. In the example below, using $file over for each file test, instead of _ for subsequent uses, is a performance hit. You should be careful that the last-tested file is what you think it is, though.

    if (-d $file) {         # $file is a directory
        # ...
    } elsif (-l _) {        # $file is a symlink
        # ...
    }

Package names begin with a capital letter in each word, followed by lower case letters.

    P5EE::Standard          # good
    P5EE::Authz             # good
    P5EE::MainCode          # good

Use all lower case for POD files which are documentation only.

    P5EE::styleguide        # good for doc only

Naming for modules should be according to the following general rules.

    All P5EE services which have *broad* support from the 
        p5ee@perl.org list would go into the "P5EE" package
    Naming style is similar to other modules on CPAN
    Naming choice draws from precedent of other modules on CPAN
    Naming choice draws from precedent of J2EE

Packages which aren't intended to be instantiated as objects may have an ``adjective'' or ``concept'' for a name (i.e. P5EE::Standard). Packages which are Modules/Classes and are intended to be instantiated as objects should be nouns, potentially accompanied by modifying adjectives (i.e. P5EE::Authen::Principal).

Indents

Code checked into CVS must never contain tabs. Patches of code with tabs do not email well, and different people have their tabstops set different ways. If you want to set tab stops on your editor, just make sure it converts tabs to spaces when it saves the file.

Indentation for normal block-style coding should be 4 spaces. The settings for Emacs and vim are as follows.

Line Lengths

Maximum line lengths should be 77 columns (or 75 columns for an unbroken line of characters). This is for maximum portability to different people's development environments and for decent transmission through e-mail to a wide array of e-mail clients (i.e. for patches).

Example: Eudora 3.0.6 wraps a solid, single line of 80 non-whitespace characters (i.e. ######...#####) at character 76. If there are spaces in the line, it allows lines up to character 78 before wrapping the last words down to the next line. If sources have no more than 77 characters in a line, a ``diff -u'' patch will add a column, and the lines will escape being folded.

Blank Space

No space before a semicolon that closes a statement.

    foo(@bar) ;     # wrong
    foo(@bar);      # right

Line up corresponding items vertically.

    my $foo   = 1;
    my $bar   = 2;
    my $xyzzy = 3;
    open(FILE, $fh)   or die $!;
    open(FILE2, $fh2) or die $!;
    $rot13 =~ tr[abcedfghijklmnopqrstuvwxyz]
                [nopqrstuvwxyzabcdefghijklm];
    # note we use a-mn-z instead of a-z,
    # for readability
    $rot13 =~ tr[a-mn-z]
                [n-za-m];

Put blank lines where they make sense for readability, such as the following. Put blank lines between groups of code that do different things. Put blank lines after your variable declarations. Put a blank line before a final return() statement. Put a blank line following a block (and before, with the exception of comment lines).

An example:

    # this is my function!
    sub foo {
        my (@data) = @_;
        my $obj = new Constructor;
        my ($var1, $var2);
        $obj->setFoo($data[1]);
        $var1 = $obj->getFoo(1);
        $var2 = $obj->getFoo($var1);
        display($var1, $var2);
        return($data[0]);
    }
    print 1;

Parentheses

For control structures, there is a space between the keyword and opening parenthesis. For functions, there is not.

    for(@list)         # wrong
    for (@list)        # right
    my ($ref)          # OK
    my ($ref)          # preferred
    localtime ($time); # wrong
    localtime($time);  # right

Be careful about list vs. scalar context with parentheses!

    my @array = ('a', 'b', 'c');
    my ($first_element) = @array;           # a
    my ($first_element) = ('a', 'b', 'c');  # a
    my $element_count  = @array;            # 3
    my $last_element   = ('a', 'b', 'c');   # c

Always include parentheses after functions, even if there are no arguments. There are some exceptions, such as list operators (like print) and unary operators (like undef, delete, uc).

There is no space inside the parentheses, unless it is needed for readability.

    for ( map { [ $_, 1 ] } @list )     # OK
    for ( @list )                       # not really OK, not horrible

On multi-line expressions, match up the closing parenthesis with either the opening statement, or the opening parenthesis, whichever works best. Examples:

    @list = qw(
        bar
        baz
    );              # right
    if ($foo && $bar && $baz
         && $buz && $xyzzy
    ) {
        print $foo;
    }

Whether or not there is space following a closing parenthesis is dependent on what it is that follows.

    print foo(@bar), baz(@buz) if $xyzzy;

Note also that parentheses around single-statement control expressions, as in if $xyzzy, are optional (and discouraged) if it is absolutely clear -- to a programmer -- what is going on. There is absolutely no need for parentheses around $xyzzy above, so leaving them out enhances readability. Use your best discretion. Better to include them, if there is any question.

The same essentially goes for perl's built-in functions, when there is nothing confusing about what is going on (for example, there is only one function call in the statement, or the function call is separated by a flow control operator). User-supplied functions must always include parentheses.

    print 1, 2, 3;                          # good
    delete $hash{key} if isAnon($uid);      # good

However, if there is any possible confusion at all, then include the parentheses. Remember the words of Larry Wall in the perlstyle manpage:

    When in doubt, parenthesize.  At the very least it will
    let some poor schmuck bounce on the % key in vi.
    Even if you aren't in doubt, consider the mental welfare
    of the person who has to maintain the code after you, and
    who will probably put parens in the wrong place.

So leave them out when it is absoutely clear to a programmer, but if there is any question, leave them in.

Braces

(This is about control braces, not hash/data structure braces.)

There is always a space befor the opening brace.

    while (<$fh>){      # wrong
    while (<$fh>) {     # right

A one-line block may be put on one line, and the semicolon may be omitted.

    for (@list) { print }

Otherwise, finish each statement with a semicolon, put the keyword and opening curly on the first line, and the ending curly lined up with the keyword at the end.

    for (@list) {
        print;
        smell();
    }

perlstyle likes to have ``uncuddled elses'':

    # right
    if ($foo) {
        print;
    }
    else {
        die;
    }
    # wrong
    if ($foo) {
        print;
    } else {
        die;
    }

Operators

Put space around most operators. The primary exception is the for aesthetics; e.g., sometimes the space around ``**'' is ommitted, and there is never a space before a ``,'', but always after.

    print $x , $y;  # wrong
    print $x, $y;   # right
    $x = 2 >> 1;    # good
    $y = 2**2;      # ok

Note that ``&&'' and ``||'' have a higher precedence than ``and'' and ``or''. Other than that, they are exactly the same. It is best to use the lower precedence version for control, and the higher for testing/returning values. Examples:

    $bool = $flag1 or $flag2;       # WRONG (doesn't work)
    $value = $foo || $bar;          # right
    open(FILE, $file) or die $!;
    $true  = foo($bar) && baz($buz);
    foo($bar) and baz($buz);

Note that ``and'' is seldom ever used, because the statement above is better written using ``if'':

    baz($buz) if foo($bar);

Most of the time, the confusion between and/&&, or/|| can be alleviated by using parentheses. If you want to leave off the parentheses then you must use the proper operator. But if you use parentheses -- and normally, you should, if there is any question at all -- then it doesn't matter which you use. Use whichever is most readable and aesthetically pleasing to you at the time, and be consistent within your block of code.

Break long lines AFTER operators, except for ``and'', ``or'', ``&&'', ``||''. Try to keep the two parts to a binary operator (an operator that has two operands) together when possible.

    print "foo" . "bar" . "baz"
        . "buz";                    # wrong
    print "foo" . "bar" . "baz" .
        "buz";                      # right
    print $foo unless $x == 3 && $y ==
        4 && $z == 5;               # wrong
    print $foo unless $x == 3 && $y == 4
        && $z == 5;                 # right

Other

Put space around a complex subscript inside the brackets or braces.

    $foo{$bar{baz}{buz}};       # OK
    $foo{ $bar{baz}{buz} };     # better

In general, use single-quotes around literals, and double-quotes when the text needs to be interpolated.

It is OK to omit quotes around names in braces and when using the => operator, but be careful not to use a name that doubles as a function; in that case, quote.

    $what{'time'}{it}{is} = time();

When making compound statements, put the primary action first.

    open(FILE, $fh) or die $!;      # right
    die $! unless open(FILE, $fh);  # wrong
    print "Starting\n" if $verbose; # right
    $verbose && print "Starting\n"; # wrong

Use here-docs instead of repeated print statements.

        print <<EOT;
    This is a whole bunch of text.
    I like it.  I don't need to worry about messing
    with lots of print statements and lining them up.
    EOT

Just remember that unless you put single quotes around your here-doc token (<<'EOT'), the text will be interpolated, so escape any ``$'' or ``@'' as needed.


REQUIREMENTS RFC AND CODING PROCEDURE

This is for new programs, modules, specific APIs, or anything else.

Contact for core team is the P5EE-development mailing list. Discuss all ideas there.

The basic process for a new P5EE service is:

    get the blessing from the P5EE list for a top-level package name
    (i.e. "P5EE::NewModule")
    begin a CPAN-able source directory skeleton
    write the spec (no code) as POD inside the target module(s)
    publish HTML to the web
    announce whenever progress is made so that comments can be sought
    code is added after there is broad support for the API spec
    and supporting doc


BUG REPORTS, PATCHES, CVS

We don't have bug tracking set up yet.

Use diff -u for patches.

Do not add anything to the main branches in CVS without approval from a member of the core team.


TO DO

lots


ACKNOWLEDGEMENTS

This style guide was based on the slashcode style guide.

It is also in conformance with the mod_perl style guide and is in the spirit of the C-language Apache style guide.

  http://slashcode.com/docs/slashstyle.html
  http://cvs.apache.org/viewcvs.cgi/modperl-docs/src/devel/modperl_style/modperl_style.pod?rev=1.5
  http://dev.apache.org/styleguide.html


CHANGES

    $Log: perlstyle.pod,v $
    Revision 1.2  2001/11/30 16:00:52  spadkins
    Renamed 'Component' to 'Service' throughout. Improved perldocs.
    Revision 1.1  2001/11/22 05:16:59  spadkins
    Major new architectural framework proposal
    Revision 1.1  2001/11/16 23:21:38  spadkins
    initial stuff


VERSION

$Id: perlstyle.pod,v 1.2 2001/11/30 16:00:52 spadkins Exp $